A dataset containing the 1,150 conversations of 440 speakers of American English. More information on the metadata in this data can be found here https://catalog.ldc.upenn.edu/docs/LDC97S62/swb1_manual.txt.

sdac

Format

A data frame with 223,606 rows and 20 variables:

doc_id

ID for each conversation document

damsl_tag

DAMSL dialog act annotation labels

speaker

Label for each speaker in the conversation

turn_num

Number of contiguous utterance turns for a given speaker

utterance_num

The cumulative number of utterances in the conversation

utterance_text

The actual dialog utterance

speaker_id

Unique speaker identification code

sex

Sex of the speaker

birth_year

Year that the speaker was born

dialect_area

Region from the US where the speaker spent first 10 years

education

Highest educational level attained

ti

...

payment_type

Form of payment for participation

amt_pd

Payment amount for participation

remarks

Misc. comments

calls_deleted

...

speaker_partition

...

Source

https://catalog.ldc.upenn.edu/docs/LDC97S62/