Tag Archives: linguistic

Corpus Tools for English Teachers

typesetting letters for a printing press

I recently attended Ohio University’s annual CALL Conference where I discovered a handful of interesting corpus-based resources worth blogging about.  Most of these come from Chris DiStasio’s presentation “How Corpus-based Tools Can Benefit Your ESL Classroom” and from my subsequent exploration of them.

Corpus of Contemporary American English (COCA) – COCA is a huge (450 million words and counting) balanced corpus to which 20 million words have been added since 1990.  The interface takes some getting used to, but it is quite powerful.  You can search for frequency of words, frequency of collocates, structures based on part-of-speech, and much, much more.  One of the instructors in the highest level of our program asks his students to do searches based on the words in their vocabulary book.  From the collocates, they can identify the most frequent prototype strings or chunks.  These often sound far more native-like than what students (and in many cases, vocabulary textbook authors) come up with.  If you haven’t yet, take a few minutes (or hours) and explore COCA.

Word and Phrase.info – This site, which Chris shared in his presentation, at first seems to be the COCA corpus with a simplified interface.  But in addition to being a simpler way to query the COCA corpus, texts can be uploaded and analyzed based on the use of high frequency words (the 500 most frequent, the next 2500 most frequent, the least frequent, and “academic” words — a note on this last set is below) each of which is then linked to examples in the COCA corpus.  This can be a very useful tool for students who want a quick snapshot of how their writing compares to a target sample.  For example, if they aspire to be published in a given academic journal, they can upload an article (or several articles form that journal) and compare the analysis to their own writing.  As with the COCA interface, there are lots of other features that warrant further exploration.

Academic Vocabulary Lists – My curiosity about what Word and Phrase.info defined as an “academic” word led me to this site, which describes how the Academic Vocabulary List (AVL) was created.  Like the Academic Word List (AWL) that April Coxhead developed in 2000, the AVL is a corpus-based list of vocabulary words that appear with higher frequency in academic texts.  In both cases, high frequency words are first omitted leaving only academic words.  But whereas Coxhead built her own 3.5 million-word academic corpus an omitted the General Service List (GSL), a list that has been around since 1953, the AVL is based entirely on the 120 million-word academic portion of the COCA corpus.  Its creators claim better coverage of the COCA academic corpus (14%) compared to the AWL (7.2%).  And although I find this logic a bit circuitous (How could a list based on a given corpus not cover that corpus better than a list that is based on a different corpus?) the development of a more recent (2013) list of academic vocabulary is intriguing.

Just The Word.com – This is another resource described by Chris in his presentation.  This website, based on the 80 million-word British National Corpus (BNC), offers an even simpler, Google-inspired interface.  The user enters a word or phrase in the search box and clicks on one of three buttons: Combinations, which provides collocates; Alternatives from Thesaurus, which links to the phrase with one or more words replaced with synonyms to show the strength of the links between words in the original phrase; and Alternatives from Learner Errors, which purports to link to actual user errors, but I wasn’t able to see much difference between this and Alternatives from Thesaurus.  Although simpler, this tool took me a few tries to get the hang of.  For example, Alternatives from Thesaurus only works with phrases, which I did not immediately realize.  But aside from this initial learning curve, this tool is a very straightforward way for students to easily search for collocates and to learn more about the nativeness of their word choices.  And, like Word and Phrase.info, search results are linked to the corpus for quick and easy access to multiple authentic examples.

If you use these tools, use them in ways other than I’ve described, or know of others, let me know in the Comments.

Leave a comment

Filed under Resources

Edupunk Eye-Tracking = DIY Research

One of my favorite presentations at the 2011 Ohio University CALL Conference was made by Jeff Kuhn who presented a small research study he’d done using the above eye-tracking device that he put together himself.

If you’re not familiar with eye-tracking, it’s a technology that records what an person is looking at and for how long.  In the example video below, which uses the technology to examine the use of a website, the path that the eyes take is represented by a line.  A circle represents each time the eye pauses, with larger circles indicating longer pauses.  This information can be viewed as a session map of all of the circles (0:45) and as a heat map of the areas of concentration (1:15).

This second video shows how this technology can be used in an academic context to study reading.  Notice how the reader’s eyes do not move smoothly and that the pauses occur for different lengths of time.

Jeff’s study examined the noticing of errors.  He tracked the eyes of four ESL students as they read passages with errors and found that they spent an extra 500 milliseconds on errors that they noticed.  (Some learners are not ready to notice some errors.  The participants in the study did not pause on those errors.)

The study was interesting, but the hardware Jeff built to do the study was completely captivating to me.  He started by removing the infrared filter from a web cam and mounting it to a bike helmet using a piece of scrap metal, some rubber bands and zip ties.  Then he made a couple of infrared LED arrays to shine infrared light towards the eyes being tracked.  As that light is reflected by the eyes, it is picked up by the webcam, and translated into data by the free, open-source Ogama Gaze Tracker.

So, instead of acquiring access to a specialized eye-tracking station costing thousands of dollars, Jeff has built a similar device for a little over a hundred bucks, most of which went to the infrared LED arrays.  With a handful of these devices deployed, almost anyone could gather a large volume of eye-tracking data quickly and cheaply.

Incidentally, if you are thinking that there are a few similarities between this project and the wii-based interactive whiteboard, a personal favorite, there are several: Both cut the price of hardware by a factor of at least ten and probably closer to one hundred, both use free open-source software, both use infrared LEDs (though this point is mostly a coincidence), both have ties to gaming (the interactive whiteboard is based on a Nintendo controller; eye-tracking software is being used and refined by gamers to select targets in first-person shooters), and both are excellent examples of the ethos of edupunk, which embraces a DIY approach to education.

Do you know of other interesting edupunk projects?  Leave a comment.

5 Comments

Filed under Inspiration

Blank or Blank: a Concordancer Game

This is a 10-minute demo of a web-based game I’ve been thinking about.  At its heart, it is a concordancer, but the game is also a repeatable, user-directed tool that could be used to study many interesting linguistic structures.  It could be used in any language and in other, non-linguistic disciplines.  I’ve also incorporated crowdsourcing and social networking to make it more useful and more fun.  And it’s so simple, it just might work.

Don’t believe me?  Too good to be true?  Perhaps.  Watch the demo and decide for yourself.  Then, share your reaction in the comments.

7 Comments

Filed under Projects