Using the Corpus in First-Year Writing: Combining Top-down and bottom-up approaches

I’m very pleased to have a guest blogger, James Garner,  detail his approach to using corpus tools in his EAP classroom. You can follow James (full bio below) on Twitter at @ALesl_JamesRG or email him at james.r.garner@gmail.comFeel free to post questions or comments. Also, if you’d like to be a guest blogger and share a corpus-based lesson, a review of a corpus tool, etc., please do contact me.

Using the Corpus in First-Year Writing: Combining Top-down and bottom-up approaches by James Garner

One of the more common critiques that have been brought up when discussing Data-driven Learning (or corpus-based pedagogy) is that it relies on a bottom-up approach to language processing. Critics claim the approach fails to consider the wider discourse of a text and its relation to the lexicogrammatical choices writers make. This concern is especially important in applying corpus tools to English for Academic Purposes (EAP) classes, where emphasis is often placed on more macro-textual features such as rhetorical moves (e.g. John Swales’ CARS model). How, then, can we reconcile the bottom-up of corpus investigation and the top-down of rhetorical moves in an EAP writing class?

A lesson I completed with my English Composition II class this semester was one attempt at this. This class is part of the first-year composition program at Georgia State University and is designed to prepare students for writing at the university level. The specific section I am teaching is a mixture of ESL international students, Generation 1.5 students, and native English speaking students, with a generally even mix between them. Corpus use in my class began early on with students being introduced to corpus use within the first few weeks of class.

The lesson I am going to describe today focused on summary-critique writing, one of the major writing assignments in the course (e.g. book reviews). For this assignment I modified a similar task from Maggie Charles’ 2007 article in the Journal of English for Academic Purposes (which I highly recommend). I will add a citation at the end of this article so you can find it.

The lesson was divided into two parts over two class sessions: one focusing on top-down analysis and one on bottom-up.

Part 1: Top-down Analysis

Students were given an academic book review and asked to read and analyze it with a group of classmates. They were told to focus on the functions the writer was accomplishing with each paragraph, then each sentence within the paragraph. To help them with this, I wrote on the board “What is the author doing? What is he trying to tell us?” I also spent time with each group of students, asking them guiding questions about the things they were finding. This half ended with a whole-class discussion in which we wrote an outline of the book review that included the moves the writer was making (summary à general or specific praise à critique à possible improvements).

Part 2: Bottom-up Analysis and Corpus Work

Now the “what” was replaced by “how”. I asked students (again working in groups) to find example sentences or phrases where the writer was giving praise to the book, critiquing it, or offering possible improvements that the writer could have made. This again was followed by a whole class discussion of their results. Through this discussion students were able to make a list of words (verbs, adjectives, adverbs) that were used in each function and notice how the choice of words related to the rhetorical move. For example, students noticed that a lot of the words used for critiquing in the essay were more cautious and weak in regard to the claim being made (i.e. hedged) compared to the words used in giving praise. They also noticed how modals were employed to indicate areas the author of the reviewed book could have improved upon.

Following this discussion, students were instructed to choose 5 words from the lists and go to the corpus (either COCA or MICUSP) and investigate their use. I suggested to them that they not only look at frequency information, but also at the phraseology the items were frequently occurring in (e.g. The author fails to note…, ). I also mentioned that, if they were using MICUSP, they should take a look at where in the text the items were occurring. The homework for this lesson was to write up a very short report of their results and bring them to the next class for discussion.

Looking back on the lesson, I would say that the students got a lot out of it. In reading their drafts of their summary-critiques, I noticed that many of them not only were at least somewhat able to use the moves structure, but also were using some of the lexicogrammatical structures they had found using the corpus. Issues still remained, but in subsequent drafts they were able to make the necessary improvements. Subsequent discussions with the students also revealed that many of them, including the NS writers, enjoyed the lesson and got something out of it.

As for myself, I feel like this lesson was a good first step in trying to find better ways we can incorporate corpus tools into an EAP writing classroom. When we can link the linguistic with the rhetorical, we can make the insights gained from these lessons more relevant to students immediate concerns, increasing not only their buy-in, but also the buy-in of other, possibly more skeptical, writing instructors. Top-down or bottom-up? Why not both?

Charles, M. (2007). Reconciling top-down and bottom-up approaches to graduate writing: Using a corpus to teach rhetorical functions. Journal of English for Academic Purposes, 6, 289-302.

James Garner is a PhD student at Georgia State University in Atlanta, Georgia. His experience teaching English as a second/foreign language includes classes on speaking and listening, general composition and academic writing at various levels in Germany, the USA and South Korea. His research interests include Corpus Linguistics, Data-Driven Learning, and English for Academic Purposes. His e-mail address is

Using Antconc & the COCA for student projects

Tomorrow my 406 (Modern English Grammar) students submit an analysis assignment in which they’ve been asked to build and analyze their own small corpus using Antconc. Previously in the term, they’ve completed two corpus reports and one analysis assignment using the COCA, COHA, Glowbe, or BNC, although they mostly choose the COCA. I want to detail the corpus projects in hopes they’ll give you some ideas for how you may implement corpus study into your classroom. I’d love to hear your comments.

1) The corpus report: The corpus reports are intended to be informal, somewhat casual investigations of language phenomena the students encounter in their daily lives. I urge them to note interesting phrases or words overheard from friends, classmates, people at Starbucks, etc. I’m mainly trying to scaffold them into a corpus approach and help them see the value of an evidence-based approach to language study. I’m continually urging them to move from prescriptive judgments to more context-sensitive descriptive explanations.  Students are generally inventive, and although the course has a grammar focus, I allow more lexically-oriented studies as well. Students this term have investigated uses of feminism/feminist, tested traditional comparative & superlative rules between the BNC and the COCA, investigated the informal pronoun y’all, found the most common pre-modifiers for man & woman, etc.

For the report, I ask students to produce a 1.0-1.5 write-up of their findings that has the following parts: 1) Brief intro to your topic and explanation of why you chose this item 2) clear explanation of the search syntax used 3)  findings 4) interpretations, i.e. tell us what it means. And then on the day the assignments are due, we have what I call a “corpus report roundtable” where all the students share their investigation.

2) The Analysis Assignments: The analysis assignments are similar in structure to the corpus reports but ask the students to ask multiple research questions and probe deeper into their topic. For these papers, students must provide a more detailed introduction of the grammar item being investigated and are encouraged to use grammar texts and articles from class as references. In the first assignment, they extend a corpus report to 3-5 pages following the same template (intro, methods, findings, discussion). For the next assignment, I provide a tutorial on Antconc and encourage them to build their own small, specialized corpus. For this assignment they analyze a spoken discourse event, and for the final project, they study written discourse.   I’ve had some great projects, e.g. adjectives of evaluation across Amazon reviews, investigations of adjective & genitive patterns for female characters in Bronte novels, and studies of nominalization is academic writing.

If you’d like to see the assignment sheets for these activities or have any questions, post here or email me at

Corpus tutorial videos

I made this video and a few others with a friend several years ago. We thought we were so clever to name our Youtube channel “The CALL Station”. Amazing to see that this video on collocations has over 7,000 views and our four videos have over 15,000! We had a good time making the videos and once had big plans to make many more. We were going to add subtitles in multiple languages (the only we added was Japanese to the collocations video), teach many more functions, and explain other corpus tools. Maybe one day I’ll come back and update these.

But check out our introduction to collocations. Make me cringe a bit, but perhaps you’ll find them useful.

CartoDB, GIS, and Corpus Linguistics

Since reading Hardie & Gregory’s 2011 article titled “Visual GISting”, I’ve been pursuing integrations of GIS and corpus linguistics. My initial maps are not incredibly dynamic (just plots of place names in a corpus), but I’ve had great fun exploring CartoDB and Google’s Fusion Tables. Certainly CartoDB is the better of the two, but it’s possible the options you’ll desire will require a fee. Fusion tables just isn’t close functionally or aesthetically, and I’m concerned to go too far with Fusion Tables because Google may just suspend the service (remember Google Wave?).

I do think there are interesting research directions to be pursued with GIS and corpus linguistics. I’ve been working to develop what I’m calling a taxonomy of ecological deixis markers. I’m developing a scheme to tag items in a corpus with GPS info and then plot in a GIS. Yes, place names are obvious and are somewhat easy to plot. It takes some time extracting and matching with a gazetteer but not too difficult. The next level is plotting deixis relationships, or something I’m thinking of as ecological deictic density. Not totally fleshed out but working with environmental texts and considering variation in usage of geographical names, geographical terms, cardinals, and other referentials.

AZ-TESOL: Corpora in EAP Writing Classrooms

I really enjoyed my talk a few weeks ago at AZ-TESOL in Flagstaff. It was great to have so many teachers in attendance and to get such great feedback about my presentation. I really felt I did my part in removing the mystery from corpus approaches. So often I hear teachers say, “I’ve heard of corpus, but I don’t really know what it is.” Well, I’m trying to change that. In a few weeks I’m speaking to a group of L2 writing teachers on campus but also presenting at the Symposium of Second Language Writing in Tempe. Both talks are focused on corpus approaches in EAP classrooms.

Here are my slides from AZ-TESOL: AZ-TESOL

Using the COCA

I developed this worksheet to be used in a corpus training session in an upper-level undergraduate course on Modern English Grammar. Most of these questions have rather clear answers that all students should obtain if they answer the question successfully. In Leech’s terms, this is a “convergent” corpus activity because the students should “converge” on the same answer. However, other items may produce divergence as students produce different qualitative interpretations of the findings.

Also, while these are developed for what you may consider advanced undergraduates, I believe these activities could be scaled to various audiences. For example, a few of these questions work well with EAP students.

And these questions are designed to be answered by using the COCA at

I’d love to hear your comments.

Here is the handout: COCA Worksheet

A COCA Read Me

I  created this document for a corpus tutorial for my Modern English Grammar class. Yes, there are some great help manuals on the Corpus of Contemporary American English’s website (aka The COCA), but my students found this useful. I think it helped them to be able to have the corpus on one side of the window and the document on the other. Toggling between frames in the corpus can be confusing.

Here’s the document: COCA_ReadMe