I spent much of my youth listening to hip hop, or, as it was called back then, rap music. This was long before MP3 players and long before you could Google your favorite song lyrics. It was also long before I knew anything about textual analysis, let alone before I thought about using unique words per n words as a measure of variety in vocabulary.
So, when Matt Daniels published this piece called The Largest Vocabulary in Hip Hop last month, it was both a flash back to the music of my youth and a flash forward to some of my current interests in corpus linguistics.
Daniels does a very nice analysis, so I won’t repeat much of it here. Just follow the link and scroll down to see the details. Be aware that some of the analysis incorporates a bit of slang that may not make it completely kid friendly.
Most noteworthy in the analysis are the two baselines of comparison: Shakespeare (5170 unique words per 35,000 words) and Herman Melville (6,022 unique words in the first 35,000 words of Moby Dick). Of the 85 rappers analyzed, 16 use a wider vocabulary than Shakespeare and 3 are above Melville. So, if you ever thought all hip hop was a simplistic art form, you may want to take another look. It’s amazing what an analysis of the data can show us.