Category Archives: Resources

The List of Lists

dictionaries

I’ve been tinkering with AntConc, Laurence Anthony’s free concordancer, which has led me down a bit of a rabbit hole of lists generated by corpus linguists over the past 60 years.  I’ve listed a few that I’ve used, sometimes within AntConc, to analyze students’ writing.  If you’ve taught students to investigate their linguistic hunches via the Corpus of Contemporary American English (COCA), you might also consider teaching them to put their own writing into a tool like AntConc to analyze their own writing as well.  By including the lists below a blacklist (do not show) or a whitelist (show only these), students can hone in on a more specific part of their vocabulary.  Most of these lists are available for download, which means you can be up and running with your own analysis very quickly.

The lists (in chronological order):

General Service List (GSL) – developed by Michael West in 1953; based on a 2.5 million word corpus.  (Can you imagine doing corpus linguistics in 1953?  Much of it must have been by hand, which is mind boggling.)  Despite criticism that it is out of date (words such as plastic and television are not included, for example), this pioneering list still provides about 80% coverage of English.

Academic Word List (AWL) – developed by Averil Coxhead in 2000; 570 words (word families) selected from a purpose-built academic corpus with the 2000 most frequent GSL words removed; organized into 9 lists of 60 and one of 30, sorted by frequency.  Scores of textbooks have been written based on these lists, and for good reason.  In fact, we have found that students are so familiar with these materials, they test disproportionately highly on these words versus other advanced vocabulary.

Academic Vocabulary List (AVL) – the 3000 most frequent words in the 120 million words in the academic portion of the 440 million word Corpus of Contemporary American English (COCA). This word list includes groupings by word families, definitions, and an online interface for browsing or uploading texts to be analyzed according to the list.

New General Service List (NGSL) – developed by Charles Browne, Brent Culligan, and Joseph Phillips in 2013; based on the two-billion-word Cambridge English Corpus (CEC); 2368 words that cover 90.34% of the CEC.

New Academic Word List (NAWL) – based on three components: the CEC Academic Corpus; two oral corpora, the Michigan Corpus of Academic Spoken English (MICASE) and the British Academic Spoken English (BASE) corpus; and on a corpus of published textbooks for a total of 288 million words. The NAWL is to the NGSL what the AWL is to the GSL in that it contains the 964 most frequent words in the academic corpus after the NGSL words have been removed.

Advertisements

Leave a comment

Filed under Resources

Raw. What is it good for?

students vs teachers-1 cropped

When I first came across Raw, a free, online data visualization tool, I channeled my inner Edwin Starr and asked, “What is it good for?”  It turns out the answer is “absolutely everything.”  Or pretty close to it.

Raw is extremely user friendly.  It’s built on D3.JS, which is pretty powerful.  If you, like me, haven’t had time to explore D3 in depth (or if, also like me, you’re not sure you have the skills to take it on,) Raw greatly simplifies the process.  And all of the data is processed in your browser, which means your data is never copied and stored on their servers.

So, what can Raw do for you?  Well take your favorite data set and paste it into the text box (or choose from one of the four example data sets provided).  Then choose from one of the 15 chart types and drag components for your data into the axes or other options for the cart type you have chosen.  You can do this as many times as you like to get the data to try on different options.  Finally, customize your visualization by adjusting the size, scale, and colors of your visualization before choosing how you want to export your results.  It’s amazingly easy!

I created the visualization at the top of this post by feeding in some data on teachers (left) and students (right).  The lines connecting them represent classes that the students had with each teacher with thin lines for one semester and thick ones for the next.  I wanted to explore how students move through our program.  Here, it’s easy to see that most students move up from one level to the next, but there are some that skip levels and some that repeat levels.  The students and teachers are not arranged in order from lowest to highest level, though this would be possible and might make it easier to see these trends.

There are lots of other options within Raw and, depending on what your data include, some may be more useful than others.  But the beauty of Raw is that you are only a couple of clicks away from any of them, making it very easy to try several visualizations until you find one you like.

Leave a comment

Filed under Resources

Data is Beautiful

graph of "language" as a tag in TED talksVisualization of how often “language” is a tag in TED Talks.

I’ve mentioned data visualizations in several previous posts, so it may not be surprising that I’m writing a trove I’ve recently found: the dataisbeautiful subreddit.  In addition to lots of excellent data visualizations (and some mediocre ones) there’s lots of interesting discussion, including responses to previous visualizations (for example, compare this early version of “How we die” to this follow up.)

One I just came across is someone asking about a pattern in some data, specifically why Google searches for “1990s” peak in May of almost every year.  Other decades follow the same pattern.  Several correlates are suggested (high school reunions, for example) but it turns out that high school proms look like the best correlate.  So, 1950s, 1960s, 1970s, 1980s, and, yes, 1990s, seem to be heavily-Googled prom themes.

If you’re not familiar with Reddit, this is a great subreddit to jump into.  One of the key features of Reddit is that users can vote content up or down, which means that the best content rises to the top (though the definition of “best” is open to the interpretation of every user.)  It’s free to join and not even an email address is required.  You can lurk for a while, simply up / downvote, or jump right into conversations with people from across the internet on almost every conceivable topic, including the data visualizations in dataisbeautiful.

Leave a comment

Filed under Resources

More Reaction GIFs for the ESL Classroom

tom brady no high 5

I’ve written about using reaction GIFs in the classroom before, but a few collections recently caught my eye.  A reaction GIF is a small, animated image that typically summarizes a mood or feeling more quickly or succinctly than words can.  For example, in the image above, quarterback Tom Brady unsuccessfully searches for a teammate to high five.  Many of us can probably relate to this situation; even if you’ve never been left hanging for a high five, this GIF can still be a metaphor for other times in your life in which the people surrounding you are unable or unwilling to share in your excitement.

The following links to Reddit contain a treasure trove of reaction GIFs.  Note that, like anything on the internet, some of the content may not be safe for work (NSFW).  Depending on the student population you work with, you may want to preview this material before you use any of these reaction GIFs in your classroom.  As I wrote in my previous post, these GIFs can serve as excellent starting points for student discussions, writing activities, and more.

If you could sum up your life in a GIF, what would it be? – In this Reddit forum, Redditors post their reaction GIF responses to this question.  As you click through them, you’ll notice themes of self-deprecating humor and a bit of depression becoming the common refrain.  Many of these GIFs summarize a generally frustrated attitude, which can be interesting.

GIFs as comments collection – This is a collection of comment / reaction GIFs.  Many of the posts have links to multiple GIFs.  Lots of general and generic internet forum reactions here.

Retired GIF – This is a subreddit in which Redditors post links to conversation threads in which a GIF has been posted as a response in the “most appropriate context conceivable.”  Each link will take you to the conversation including the GIF and the context in which it was used.  If you’re not familiar with how GIFs are used as part of online discussions, this will get you acquainted very quickly.

2 Comments

Filed under Resources

Corpus Tools for English Teachers

typesetting letters for a printing press

I recently attended Ohio University’s annual CALL Conference where I discovered a handful of interesting corpus-based resources worth blogging about.  Most of these come from Chris DiStasio’s presentation “How Corpus-based Tools Can Benefit Your ESL Classroom” and from my subsequent exploration of them.

Corpus of Contemporary American English (COCA) – COCA is a huge (450 million words and counting) balanced corpus to which 20 million words have been added since 1990.  The interface takes some getting used to, but it is quite powerful.  You can search for frequency of words, frequency of collocates, structures based on part-of-speech, and much, much more.  One of the instructors in the highest level of our program asks his students to do searches based on the words in their vocabulary book.  From the collocates, they can identify the most frequent prototype strings or chunks.  These often sound far more native-like than what students (and in many cases, vocabulary textbook authors) come up with.  If you haven’t yet, take a few minutes (or hours) and explore COCA.

Word and Phrase.info – This site, which Chris shared in his presentation, at first seems to be the COCA corpus with a simplified interface.  But in addition to being a simpler way to query the COCA corpus, texts can be uploaded and analyzed based on the use of high frequency words (the 500 most frequent, the next 2500 most frequent, the least frequent, and “academic” words — a note on this last set is below) each of which is then linked to examples in the COCA corpus.  This can be a very useful tool for students who want a quick snapshot of how their writing compares to a target sample.  For example, if they aspire to be published in a given academic journal, they can upload an article (or several articles form that journal) and compare the analysis to their own writing.  As with the COCA interface, there are lots of other features that warrant further exploration.

Academic Vocabulary Lists – My curiosity about what Word and Phrase.info defined as an “academic” word led me to this site, which describes how the Academic Vocabulary List (AVL) was created.  Like the Academic Word List (AWL) that April Coxhead developed in 2000, the AVL is a corpus-based list of vocabulary words that appear with higher frequency in academic texts.  In both cases, high frequency words are first omitted leaving only academic words.  But whereas Coxhead built her own 3.5 million-word academic corpus an omitted the General Service List (GSL), a list that has been around since 1953, the AVL is based entirely on the 120 million-word academic portion of the COCA corpus.  Its creators claim better coverage of the COCA academic corpus (14%) compared to the AWL (7.2%).  And although I find this logic a bit circuitous (How could a list based on a given corpus not cover that corpus better than a list that is based on a different corpus?) the development of a more recent (2013) list of academic vocabulary is intriguing.

Just The Word.com – This is another resource described by Chris in his presentation.  This website, based on the 80 million-word British National Corpus (BNC), offers an even simpler, Google-inspired interface.  The user enters a word or phrase in the search box and clicks on one of three buttons: Combinations, which provides collocates; Alternatives from Thesaurus, which links to the phrase with one or more words replaced with synonyms to show the strength of the links between words in the original phrase; and Alternatives from Learner Errors, which purports to link to actual user errors, but I wasn’t able to see much difference between this and Alternatives from Thesaurus.  Although simpler, this tool took me a few tries to get the hang of.  For example, Alternatives from Thesaurus only works with phrases, which I did not immediately realize.  But aside from this initial learning curve, this tool is a very straightforward way for students to easily search for collocates and to learn more about the nativeness of their word choices.  And, like Word and Phrase.info, search results are linked to the corpus for quick and easy access to multiple authentic examples.

If you use these tools, use them in ways other than I’ve described, or know of others, let me know in the Comments.

Leave a comment

Filed under Resources

Web 2.0 Tools

tools

Just over a year and a half ago, Betsy Lavolette and Susan Pennestri presented a session at CALICO 2012 called Where’s the Peadagogy in Web 2.0?  In this presentation (available here), Betsy and Susan defined Web 2.0, couched these technologies in a discussion of Bloom’s taxonomy, and proposed curating an evolving list of useful Web 2.0 tools.  Naturally, they did this by crowdsourcing the list via a Web 2.0 tool, the online bookmarking site Diigo.com.

The most amazing part of all of this is that the list is still going strong and now includes over 400 items.  To access the list, go to https://groups.diigo.com/group/calicotools.  Each item has a brief description has brief notes and several tags such as reading, writing, listening and speaking, each of which can also be used to search for tools within the list.  Just click on the tag to view other resources on the list with the same tag.

To participate and contribute to this list, click on the “Join this Group” button and create a free Diigo account if you don’t already have one.  Diigo is a lot like Delicious.com, but has a few more features including the ability to highlight and annotate any web document before sharing it.  Diigo is a tool worth using on it’s own, but signing up for this group makes the experience even more useful.

2 Comments

Filed under Resources