Last month I released an OCR edition of the Modern Language Association Job Information List (MLA JIL). After two years running rhetmap.org and batch geocoding the weekly MLA JIL, I’ve looked for new ways to visually understand the current geography and past history of the field. Rhetmap started, first and foremost, as a technology exercise for myself. I wanted to learn more about geocoding large data sets in order to help me with other research projects. Later, rhetmap’s data turned out to be useful for graduate students in a research methods seminar as a meaningful data set for them to code and interpret. In this later use of rhetmap, part of what I’ve tried to do is incorporate rhetmap into my teaching. Thinking about studying rhetoric and composition? Let’s take a look at these maps. Let’s talk about the ratio of doctoral programs to tenure track jobs at the rank of assistant professor. Rhetmap has helped make my conversations about professional development and the state of the field more concrete and strategic.
The site now has a weekly audience of around 300 unique visitors and approximately 27,000 visits since May, 2012. Based on the referral logs, I’ve learned how faculty in writing studies use rhetmap as evidence in conversations inside and outside the discipline. For example, CUNY’s recent “Open Letter from Comp/Rhet area group” references rhetmap’s 2012-2013 job report in an argument for enhancing their composition and rhetoric doctoral concentration. I’ve also received e-mails about other unintended uses, for example a job seeker whose partner used the map to help in their dual career search. I’ve probably learned twice as much from people’s description of how they use rhetmap as I do creating and maintaining the resource. Once I put a new map or updated data set out there, I learn more about what the discipline and job seekers value. For example, I learned that one unanticipated audience for rhetmap turned out to be a job seeker’s family members trying to plan their dual career and relocation plans around a R/C job search.
In the case the OCR edition of the MLA JIL, part of what I intended to do by making 47 years of the MLA JIL searchable as plain text files was to see if I could geocode the #mlajil’s rhetoric and composition job history from 1965-2012 to show if there have been huge geographic shifts from year to year, decade to decade, in our job market. However, geocoding the entire JIL based on OCR scans proved unfeasible due to the low PDF quality of early MLA JIL scans.
When the prospect of batch geocoding proved less likely, I searched the #mlajil for the earliest job advertisements mentioning specific field terminology such as “computer” or “computers and writing,” because I thought that these early mentions in the MLA JIL would indicate moments of transition for our discipline. After I was done searching for individual keywords in the MLA JIL OCR corpus, I chose to release the entire OCRed resource as a tar.gz download. I did this for two reasons. First, I wanted to give other faculty in rhetorical studies the opportunity to do their own research on the OCR, and second I wanted to see what other scholars might do with the resource. My instinct that the resource would have some value outside of my own curiosity was correct. For example, scholars in English Studies such as Jonathan Goodwin have made use of the corpus and have found their own interesting results about gender, Shakespeare, and more.
Although working on rhetmap does produce a few useful deliverables to the field, as I mentioned earlier, my impulse to create the site was motivated by my desire to learn new technologies in order to help with other projects. In this regard, rhetmap projects begin as a sandbox or means of exploring new skills and methods. For example, I became interested geocoding and later the OCR software (tesseract-ocr specifically) because I wanted to learn more about how these packages work and wanted to explore their possible use/value for my Samaritan book project research. In the case of geocoding data, part of my latest book project involves a great deal of geocoding in order to understand historical diasporas of people and texts. In the OCR project, I wanted to learn more about how OCR works in order to explore what’s needed to train tesseract-ocr to deal with a new language/character set for a much larger project dealing with Samaritan script and the OCR of Samaritan manuscripts. As I practice new skills and techniques for larger projects with stakeholders outside the field, I try to develop resources that may be useful to students and faculty inside the field along the way. That said, now that rhetmap has proven useful as a resource, I predict a future challenge will be to sustain rhetmap in the years beyond its experimental phase.
1 Comment
Pingback: Collin Gifford Brooke