Arst Arsw: Star Wars in Alphabetical Order

baby darthFather’s Day by Artiee / Flickr

A friend recently lent me the book Uncharted: Big Data as a Lens on Human Culture, which discusses the development of the Google N-Gram Corpus.  After scanning millions of books, Google could not simply make them all freely available because this would essentially be republishing copyrighted works.  Instead, Google has made them all searchable by N-Grams (one-, two-, three-word phrases and so on up to n-words) which protects the copyrighted works because they are really only viewable in aggregate.  The corpus is, of course, limited in that it only includes books (as opposed to also including magazines, newspapers, oral texts, etc.), but given that it goes back hundreds of years, the size and the scope of the corpus is pretty amazing.

Early on in Uncharted, a book called Legendary Lexical Loquacious Love, a concordance of a romance novel, is affectionately described as a conceptual art piece that helped to inspire the N-Gram Corpus.  In Love, every word from a romance novel is presented in alphabetical order.  So, a word like a, which appears several times in the original source novel, is repeated scores of times.  The authors talk about how different the experience of reading a concordance of a romance novel is from reading the original romance novel, but how the former is compelling in its own way.  For example, they offer the following quote:

beautiful beautiful beautiful beautiful beautiful beautiful beautiful
beautiful beautiful beautiful beautiful beautiful beautiful beautiful
beautiful beautiful beautiful,  beautiful, beautiful, beautiful, beautiful,
beautiful, beautiful, beautiful,” beautiful. beautiful. beautiful.”
beautiful… beautiful…

These 29 occurrences of the word beautiful are, presumably, spread throughout the original novel.  But seeing them juxtaposed next to other words that begin with b (and with the scores of occurrences of the word a) gives you a different perspective on a romance novel.

What does this have to do with Star Wars?  Great question.  While reading Uncharted, I came across the following YouTube video:

Created by Tom Murphy, the video is “meant to be provocative in its uselessness.”  It took 42 hours to produce the 43-minute video, which is oddly compelling to watch.  In addition to the video, a small data bar at the bottom graphs the frequencies of each word, which is also tallied onscreen through the video.  It’s a difference experience, much like reading a concordance is different from reading the original source text.  For example, the famous scene in which Obi-Wan uses a Jedi mind trick on a couple of Stormtroopers appears in the original movie as follows:

Stormtrooper: Let me see your identification.
Obi-Wan: [with a small wave of his hand] You don’t need to see his identification.
Stormtrooper: We don’t need to see his identification.
Obi-Wan: These aren’t the droids you’re looking for.
Stormtrooper: These aren’t the droids we’re looking for.

(Source: imdb.com)

In Arst Arsw, this interaction is best summarized by the three occurrences of the word identification, which are the only three times that this word appears in the film.  Identification appears at 16:08 of the video.  There are many other interesting moments, particularly when different voices utter the same word several times (for example, leader by several rebel pilots) or when only one character uses the same word several times (for example, kid by Han Solo.)  For me, longer words are generally more interesting because they take longer to say, whereas the shorter words can fly by so quickly that they can be hard to comprehend.  One exception, however, is the word know, all 32 occurrences of which fly by in under 5 seconds.  But because the 26th know is so emphatic, it stands out against the rest.

I’m not sure if there are any other video concordances out there, but if there are, I would love to see them.  Especially if the original source material is as compelling as the original Star Wars.

Leave a comment

Filed under Inspiration

Data is Beautiful

graph of "language" as a tag in TED talksVisualization of how often “language” is a tag in TED Talks.

I’ve mentioned data visualizations in several previous posts, so it may not be surprising that I’m writing a trove I’ve recently found: the dataisbeautiful subreddit.  In addition to lots of excellent data visualizations (and some mediocre ones) there’s lots of interesting discussion, including responses to previous visualizations (for example, compare this early version of “How we die” to this follow up.)

One I just came across is someone asking about a pattern in some data, specifically why Google searches for “1990s” peak in May of almost every year.  Other decades follow the same pattern.  Several correlates are suggested (high school reunions, for example) but it turns out that high school proms look like the best correlate.  So, 1950s, 1960s, 1970s, 1980s, and, yes, 1990s, seem to be heavily-Googled prom themes.

If you’re not familiar with Reddit, this is a great subreddit to jump into.  One of the key features of Reddit is that users can vote content up or down, which means that the best content rises to the top (though the definition of “best” is open to the interpretation of every user.)  It’s free to join and not even an email address is required.  You can lurk for a while, simply up / downvote, or jump right into conversations with people from across the internet on almost every conceivable topic, including the data visualizations in dataisbeautiful.

Leave a comment

Filed under Resources

More Reaction GIFs for the ESL Classroom

tom brady no high 5

I’ve written about using reaction GIFs in the classroom before, but a few collections recently caught my eye.  A reaction GIF is a small, animated image that typically summarizes a mood or feeling more quickly or succinctly than words can.  For example, in the image above, quarterback Tom Brady unsuccessfully searches for a teammate to high five.  Many of us can probably relate to this situation; even if you’ve never been left hanging for a high five, this GIF can still be a metaphor for other times in your life in which the people surrounding you are unable or unwilling to share in your excitement.

The following links to Reddit contain a treasure trove of reaction GIFs.  Note that, like anything on the internet, some of the content may not be safe for work (NSFW).  Depending on the student population you work with, you may want to preview this material before you use any of these reaction GIFs in your classroom.  As I wrote in my previous post, these GIFs can serve as excellent starting points for student discussions, writing activities, and more.

If you could sum up your life in a GIF, what would it be? – In this Reddit forum, Redditors post their reaction GIF responses to this question.  As you click through them, you’ll notice themes of self-deprecating humor and a bit of depression becoming the common refrain.  Many of these GIFs summarize a generally frustrated attitude, which can be interesting.

GIFs as comments collection – This is a collection of comment / reaction GIFs.  Many of the posts have links to multiple GIFs.  Lots of general and generic internet forum reactions here.

Retired GIF – This is a subreddit in which Redditors post links to conversation threads in which a GIF has been posted as a response in the “most appropriate context conceivable.”  Each link will take you to the conversation including the GIF and the context in which it was used.  If you’re not familiar with how GIFs are used as part of online discussions, this will get you acquainted very quickly.

2 Comments

Filed under Resources

Corpus Tools for English Teachers

typesetting letters for a printing press

I recently attended Ohio University’s annual CALL Conference where I discovered a handful of interesting corpus-based resources worth blogging about.  Most of these come from Chris DiStasio’s presentation “How Corpus-based Tools Can Benefit Your ESL Classroom” and from my subsequent exploration of them.

Corpus of Contemporary American English (COCA) – COCA is a huge (450 million words and counting) balanced corpus to which 20 million words have been added since 1990.  The interface takes some getting used to, but it is quite powerful.  You can search for frequency of words, frequency of collocates, structures based on part-of-speech, and much, much more.  One of the instructors in the highest level of our program asks his students to do searches based on the words in their vocabulary book.  From the collocates, they can identify the most frequent prototype strings or chunks.  These often sound far more native-like than what students (and in many cases, vocabulary textbook authors) come up with.  If you haven’t yet, take a few minutes (or hours) and explore COCA.

Word and Phrase.info – This site, which Chris shared in his presentation, at first seems to be the COCA corpus with a simplified interface.  But in addition to being a simpler way to query the COCA corpus, texts can be uploaded and analyzed based on the use of high frequency words (the 500 most frequent, the next 2500 most frequent, the least frequent, and “academic” words — a note on this last set is below) each of which is then linked to examples in the COCA corpus.  This can be a very useful tool for students who want a quick snapshot of how their writing compares to a target sample.  For example, if they aspire to be published in a given academic journal, they can upload an article (or several articles form that journal) and compare the analysis to their own writing.  As with the COCA interface, there are lots of other features that warrant further exploration.

Academic Vocabulary Lists – My curiosity about what Word and Phrase.info defined as an “academic” word led me to this site, which describes how the Academic Vocabulary List (AVL) was created.  Like the Academic Word List (AWL) that April Coxhead developed in 2000, the AVL is a corpus-based list of vocabulary words that appear with higher frequency in academic texts.  In both cases, high frequency words are first omitted leaving only academic words.  But whereas Coxhead built her own 3.5 million-word academic corpus an omitted the General Service List (GSL), a list that has been around since 1953, the AVL is based entirely on the 120 million-word academic portion of the COCA corpus.  Its creators claim better coverage of the COCA academic corpus (14%) compared to the AWL (7.2%).  And although I find this logic a bit circuitous (How could a list based on a given corpus not cover that corpus better than a list that is based on a different corpus?) the development of a more recent (2013) list of academic vocabulary is intriguing.

Just The Word.com – This is another resource described by Chris in his presentation.  This website, based on the 80 million-word British National Corpus (BNC), offers an even simpler, Google-inspired interface.  The user enters a word or phrase in the search box and clicks on one of three buttons: Combinations, which provides collocates; Alternatives from Thesaurus, which links to the phrase with one or more words replaced with synonyms to show the strength of the links between words in the original phrase; and Alternatives from Learner Errors, which purports to link to actual user errors, but I wasn’t able to see much difference between this and Alternatives from Thesaurus.  Although simpler, this tool took me a few tries to get the hang of.  For example, Alternatives from Thesaurus only works with phrases, which I did not immediately realize.  But aside from this initial learning curve, this tool is a very straightforward way for students to easily search for collocates and to learn more about the nativeness of their word choices.  And, like Word and Phrase.info, search results are linked to the corpus for quick and easy access to multiple authentic examples.

If you use these tools, use them in ways other than I’ve described, or know of others, let me know in the Comments.

Leave a comment

Filed under Resources

Building a New Language

throne

I’ve somehow managed to avoid the pop cultural phenomenon that is Game of Thrones. I’m aware that it exists, and that it’s adapted from a series of fantasy novels, but I’ve never seen an episode.  An awareness of the show is hard to avoid.  For example, one of my favorite podcasts, Nerdist, hosted by Chris Hardwick, references it all the time.  I bring this up because one of the recent guests on the podcast was David J. Peterson, a linguist who created Dothraki, the language that is used by characters in Game of Thrones.  (Actually, as Peterson explains, George R. R. Martin, the author of the novels, invented the language and then Peterson had to flesh it out further, develop the phonology, etc.)

So, if you’re interested in linguistics and Game of Thrones (or either of these things) you will probably enjoy Nerdist episode #502, in which Peterson goes into depth on creating Dothraki and several other topics.  Please note, as often happens on the Nerdist, the hosts and guests occasionally drop an F-bomb or two out enthusiasm, which means that the entire episode may not be appropriate for younger audiences.  Enjoy your burrito!

Leave a comment

Filed under Inspiration

Paper-based Games for ESL Students

dice

At the inaugural Playful Learning Summit at Ohio University, I shared a couple of games that I developed for use with ESL students at Ohio State. These are both paper-based games, which stood out in a room full of computer games and an Oculus Rift connected to a Kinect. This last project — an immersive, gesture-controlled, virtual reality interface — was really cool, but isn’t something I know how to develop (yet).  But, fortunately, everyone gets paper.  I hope these two games serve as an inspiration for anyone who doesn’t think she can design a game for her students.

Football Simulation – I’ve posted about this one before, but it still stands as an easy-to-prepare, easy-to-play simulation that can help international students to understand the game of American football.  The focus, when I use the game in the classroom, is to understand what down and distance are as well as the importance of basic offensive and defensive strategies.  All that is required is one six-sided die and a printout of the document with the offense and defense  cards cut out.

Orientation to Campus Game – This is a board game I developed based on the Madeline board game.  Players travel around the campus map / board uncovering tokens when they land next to them.  If the player uncovers one of the 5 buckeye symbols, she keeps it.  If the player uncovers the name of a building, she must move to that space immediately.  The best things about this game are that it is very easy to play and that students really focus and pay attention to the most important buildings on the map.  There are no dice and you can use almost anything for player tokens.  I also really like the mechanic of moving to the place listed on the token because this changes every time the game is played.  On the down side, it is a kids game, so it doesn’t hold adults’ attention for very long.  And if the students have been on campus for even a couple of weeks, they are already familiar with most of the buildings in the game.  Still, this game could be useful for students to play while waiting for our orientation program to start because it might help them to discover buildings that they do not yet know.

So, don’t be afraid of developing games on paper if, like me, you don’t have a wide array of programming skills.  Any game that is prototyped and play-tested on paper could later be converted to a computer version.  But, by working out the kinks on paper, you can develop your game to its final version without even picking up your keyboard.

Leave a comment

Filed under Projects

Data Visualizations from the New York Times

Screen Shot 2014-04-02 at 12.43.17 PM (2)

Everyone loves a good data visualization.  And everyone loves a good data visualization even more if the visualization is interactive.  Unfortunately, I can’t embed an interactive visualization above, but click on it to link to the interactive version.  The circles represent the volume of traffic at airports around the U.S.  Clicking on a circle reveals all of the connecting flights to that airport.  I’m sure you could get this information out of some kind of heinous Excel spreadsheet, but this format is way more engaging.

This is why I was attracted to this year’s Wherry Lecture, which is hosted by the Departments of Statistics and Psychology at Ohio State.  The speaker was Amanda Cox from the New York Times‘ graphics department who spoke about the Times‘ use of data visualizations.  Amanda shared many examples that illustrated the importance of context, how a good visualization sometimes limits the amount of data in order to highlight patterns, and the importance of how the text and the visuals work together.  These are a few of my favorites.

The Jobless Rate for People Like YouNot all groups have felt the recession equally.  This visualization allows you to view trends in different demographics.  The differences can be startling.

One Report, Diverging Perspectives – Employment numbers with “Democrat” and “Republican” buttons that allow you to view the same data through different lenses.

Over the Decades, How States Have Shifted – A look at how each state has voted – Democratic or Republican – with connections to every election since 1952.

Counties Blue and Red, Moving Right and Left – Imagine a map of the wind blowing across the U.S.  Now instead of that wind representing, well, wind, imagine it representing the changes vote margin between Democratic and Republican presidential candidates.

Mapping America: Every City, Every Block – Based on U.S. Census data from 2005 to 2009, you can choose to represent ethnicity, income, housing, education, and other information on a map and then zoom out to view the entire nation or zoom in to view your neighborhood.

All of these examples provide different paths to understanding the data that is represented.  To see some of the other examples in this lecture, check out my Twitter stream (@eslchill) or follow the New York Times Graphics Department (@NYTgraphics).

Leave a comment

Filed under Inspiration