A dataset of the corpus files containing the 1,1150 conversations of 440 speakers of American English.

sdac_files

Format

A data frame with 223,606 rows and 7 variables:

doc_id

ID for each conversation document

damsl_tag

DAMSL dialog act annotation labels

speaker

Label for each speaker in the conversation

turn_num

Number of contiguous utterance turns for a given speaker

utterance_num

The cumulative number of utterances in the conversation

utterance_text

The actual dialog utterance

speaker_id

Unique speaker identification code

Source

https://catalog.ldc.upenn.edu/docs/LDC97S62/