A dataset containing the 15,475 utterances by 44 speakers of American English.

sbc

Format

A data frame with 15,475 rows and 13 variables:

id

ID for each speaker

name

Name of each speaker

gender

Gender of the speaker

age

Age of the speaker at recording

dialect

Dialect self-assessment for each speaker

dialect_state

State where each speaker was raised

current_state

State of residence for each speaker at recording

highest_edu

Highest educational degree obtained

years_edu

Number of years in the educational setting

occupation

Occupation of the speaker at recording

ethnicity

Ethnicity self-assessment for each speaker

utterance

Annotated transcription of a speaker's utterance

utterrance_clean

Simplified transcription of a speaker's utterance

Source

http://www.linguistics.ucsb.edu/research/santa-barbara-corpus