Father’s Day by Artiee / Flickr
A friend recently lent me the book Uncharted: Big Data as a Lens on Human Culture, which discusses the development of the Google N-Gram Corpus. After scanning millions of books, Google could not simply make them all freely available because this would essentially be republishing copyrighted works. Instead, Google has made them all searchable by N-Grams (one-, two-, three-word phrases and so on up to n-words) which protects the copyrighted works because they are really only viewable in aggregate. The corpus is, of course, limited in that it only includes books (as opposed to also including magazines, newspapers, oral texts, etc.), but given that it goes back hundreds of years, the size and the scope of the corpus is pretty amazing.
Early on in Uncharted, a book called Legendary Lexical Loquacious Love, a concordance of a romance novel, is affectionately described as a conceptual art piece that helped to inspire the N-Gram Corpus. In Love, every word from a romance novel is presented in alphabetical order. So, a word like a, which appears several times in the original source novel, is repeated scores of times. The authors talk about how different the experience of reading a concordance of a romance novel is from reading the original romance novel, but how the former is compelling in its own way. For example, they offer the following quote:
beautiful beautiful beautiful beautiful beautiful beautiful beautiful
beautiful beautiful beautiful beautiful beautiful beautiful beautiful
beautiful beautiful beautiful, beautiful, beautiful, beautiful, beautiful,
beautiful, beautiful, beautiful,” beautiful. beautiful. beautiful.”
beautiful… beautiful…
These 29 occurrences of the word beautiful are, presumably, spread throughout the original novel. But seeing them juxtaposed next to other words that begin with b (and with the scores of occurrences of the word a) gives you a different perspective on a romance novel.
What does this have to do with Star Wars? Great question. While reading Uncharted, I came across the following YouTube video:
Created by Tom Murphy, the video is “meant to be provocative in its uselessness.” It took 42 hours to produce the 43-minute video, which is oddly compelling to watch. In addition to the video, a small data bar at the bottom graphs the frequencies of each word, which is also tallied onscreen through the video. It’s a difference experience, much like reading a concordance is different from reading the original source text. For example, the famous scene in which Obi-Wan uses a Jedi mind trick on a couple of Stormtroopers appears in the original movie as follows:
Stormtrooper: Let me see your identification.
Obi-Wan: [with a small wave of his hand] You don’t need to see his identification.
Stormtrooper: We don’t need to see his identification.
Obi-Wan: These aren’t the droids you’re looking for.
Stormtrooper: These aren’t the droids we’re looking for.(Source: imdb.com)
In Arst Arsw, this interaction is best summarized by the three occurrences of the word identification, which are the only three times that this word appears in the film. Identification appears at 16:08 of the video. There are many other interesting moments, particularly when different voices utter the same word several times (for example, leader by several rebel pilots) or when only one character uses the same word several times (for example, kid by Han Solo.) For me, longer words are generally more interesting because they take longer to say, whereas the shorter words can fly by so quickly that they can be hard to comprehend. One exception, however, is the word know, all 32 occurrences of which fly by in under 5 seconds. But because the 26th know is so emphatic, it stands out against the rest.
I’m not sure if there are any other video concordances out there, but if there are, I would love to see them. Especially if the original source material is as compelling as the original Star Wars.