Switchboard Dialog Act Corpus

A dataset containing the 1,150 conversations of 440 speakers of American English. More information on the metadata in this data can be found here https://catalog.ldc.upenn.edu/docs/LDC97S62/swb1_manual.txt.

sdac

Format

A data frame with 223,606 rows and 20 variables:

doc_id: ID for each conversation document
damsl_tag: DAMSL dialog act annotation labels
speaker: Label for each speaker in the conversation
turn_num: Number of contiguous utterance turns for a given speaker
utterance_num: The cumulative number of utterances in the conversation
utterance_text: The actual dialog utterance
speaker_id: Unique speaker identification code
sex: Sex of the speaker
birth_year: Year that the speaker was born
dialect_area: Region from the US where the speaker spent first 10 years
education: Highest educational level attained
ti: ...
payment_type: Form of payment for participation
amt_pd: Payment amount for participation
remarks: Misc. comments
calls_deleted: ...
speaker_partition: ...

Source

https://catalog.ldc.upenn.edu/docs/LDC97S62/

Switchboard Dialog Act Corpus

Format

Source

Contents