The Brown Corpus was the first computer-readable general corpus of texts prepared for linguistic research on modern English. It contains of over 1 million words (500 samples of 2000+ words each) of running text of edited English prose printed in the United States during the calendar year 1961.

brown

Format

A data frame with 500 rows and 3 variables:

doc_id

Original file name for each written sample

category

The writing category of each sample

text

The prose for each sample with inline part-of-speech tags

Source

https://raw.githubusercontent.com/nltk/nltk_data/gh-pages/packages/corpora/brown.zip

Details

This dataset has 500 rows corresponding to the 500 samples and 3 variables. For more information: http://www.helsinki.fi/varieng/CoRD/corpora/BROWN/