Undergrads and Markup: How & Why I Got Involved

I became involved with this project in the sort of roundabout way that seems increasingly common as scholars head online. I happened to see Alex announce over twitter a Google Doc, designed for using TEI in the undergrad classroom. Later that week, when I saw Alex at a Scholars’ Lab get-together, I happened to mention how much I liked the idea. And, with that, Alex invited me to join a project that was already well underway.

I was primed to take note of this announcement. For the past few months I had been trying to imagine how one could bring something like TEI markup into the undergraduate literature classroom and to what effect. I had been prompted to begin thinking about bringing this sort of “research” into the undergrad classroom after seeing Gregory Crane speak at the Shape of Things to Come Conference last semester. (Prof. Crane’s remarks as well as the rest of the conference proceedings are available online.) Could undergraduates really do something other than be lectured at, or (in those moments where academics like to think of themselves as especially forward thinking) participate in some sort of student-centered Socratic discussion? Is there a humanities equivalent of the physics lab?

I was not immediately won over by the idea. While Crane spoke with conviction about undergraduates doing translation from classical languages, I wondered whether someone working in early-twentieth century literature (like myself) could really benefit from this approach. An undergraduate taking Latin could profitably translate some untranslated texts. But what “value” can undergraduates add to English language texts?

One answer, I think, is markup. Even a great site like the The Modernist Journals Project could benefit from some markup. Perhaps, I mused, students could compare the appearance of a poem by Ezra Pound in Blast (to take an immediately interesting example, look at “Fratres Minors”; what I wonder do those thick black lines cover?) and compare it to another version. In effect students would be creating mini-critical editions of individual poems. Or, perhaps, students could research and provide valuable annotations for the pages full of obscure references in Blast.

These, though, were merely the idle thoughts of an idle graduate student until I bumped into Alex, offering the opportunity to see how a project like this might work in practice.

I remain curious about how successful this sort of project will be in practice. When ENAM4500 had its first class meeting I tried to offer the students a brief pitch for why this experiment was worth participating in: it would provide English majors with some basic digital skills that might valuable to them (I had this article in the back of my mind); it would also expose students to basic textual scholarship, allowing them to see how the “sausage is made”—how a poem goes from the manuscript to obscure journal and, finally, to the pages of Norton.

I’m eager to see in the coming weeks how well I can deliver on that promise.

Posted in General | Tagged , | Leave a comment

Welcome to Project Tango

Welcome to the Project Tango blog. At the moment the project is in its infant stages, so bear with us.

Tango began as a series of conversations between the NINES group and other parties around the issue of accessibility to out-of-print scholarly works copyrighted before the advent of the internet, but also about the future of books in general. The name for the project came from a conversation between Jerome McGann and Madelyn Wessel, our resident copyright expert. Publishers and scholars most learn how to tango together, quipped Wessel and the rest brings us here.

I joined NINES in the summer as one of their fellows along with Annie Swafford and Michael Pickard and we were immediately recruited to the Tango project. At the time, Jerome McGann, Andrew Stauffer and Dana Wheeles (@bluesaepe) were in the thick of brainstorming adequate solutions for these out-of-print scholarly works around the usual suspects: production, stewardship and copyright. In the absence of an umbrella institution that could coordinate these issues, the main question was how to resolve the problems in a way that would not depend on such an institution, but that would still revolve around a collectivity. What you see here is the result of our continued conversations and we offer them to the public with a healthy dose of both skepticism and drive. We encourage you to join our conversation.

Storage and Stewardship: The idea is for these texts to be made available to anyone with internet access, free of charge and in their integrity —no need to replicate Google Books, after all. There are many possibilities here, and we are exploring most of them fearlessly. There’s HathiTrust, various University repositories, Open Library, an open subset of JStor, et al. This stage of the project we feel safe to punt on for now, because we must make sure that copyright and production can be taken care of before we go too far out to sea.

Copyright: To rescue these texts from the copyright limbo they inhabit, made evident by the Google Books affair, we would send a standardized letter to authors with a template attached meant for the publishers. Because contracts signed before the Internet had no proviso for digital publication, we figured a simple addendum in the form of a written agreement between author and publisher would suffice. The allure for the authors is immediate, since their often neglected works would once more reach their community of interest.

For publishers it may not be a bad deal either, since we would add value in the form of proofing and metadata, while reserving for them any profits to be made —from say a print-on-demand venture— by adding an exception to the typical creative commons attribution, share-alike, non-commercial license. Our target publishers would be small-to-medium sized presses that need help transitioning to the web. In the beginning, the selection of works necessarily have to come from the scholars themselves, prioritizing in a sense the rescue of works that are pertinent to scholarship today. Nevertheless, we imagine the possibility of a clearinghouse for pre-established lists. Because we are talking about atomized copyright retrieval, and as you will soon see, atomized production, in the end this would only work if the model we are offering spreads across the land, and if that monadic work is joined at the hip to a collectivity of the sort that NINES exemplifies. Which brings me to the most exciting and perhaps the most daring part of the project.


The product so far is simple and familiar: A PDF image with well-proofed, searchable text behind it and companion RDF. Mass production, not that simple. In order to generate enough PDFs of out-of-print works to make the model an efficient way to publish in the long run, the solution had to involve crowd-sourcing of some sort, and this is where McGann had one of those moments he has. We could kill two birds with one stone, he suggested, by using the model to teach undergraduate English majors about digital humanities and textual criticism. I must confess he had me at “teach undergraduate English majors the basics of digital humanities and textual criticism.” And to put a cherry on top, he added, the model should also involve a graduate student (or two) to run digital workshops, effectively involving all levels of the academic food-chain in the larger project of digital humanities. While the professor can go about his business, teaching lit as best suits his fancy, the teaching assistant(s) in the workshops can focus on DH related to the materials. (Jerry’s class is pursuing interpretational method via a performative, recitation-based approach to the study of poetic form and meaning). I like to think of the workshops as PHYS lab for lit-heads, minus the 1 extra credit-hour.

As we talked more and more about the pedagogical model, and in no small part thanks to the insight of Bethany Nowviskie, (@nowviskie) we saw that the production of PDFs would have to make room for the digitization, mark-up and digital analysis of primary sources. With these added techniques the students could get a more complete picture of what it is that we do around here, while broadening their understanding of the readings. If done right, we are offering the next generation an early start on the ongoing mass migration of paper to bytes that another generation of scholars begun in the 90s.

We are launching our alpha model in Prof. McGann’s ENAM 4500, English and American Poetry of the Nineteenth Century class this semester. About a week ago, Chris Forster (@cforster) joined the team, and together we will be running the said workshops and writing this blog to serve as both documentation and testament to this adventure. You should be hearing from him soon enough. Together with the blog entries, we will post all materials and tutorials used for the class under a Creative Commons license so feel free to use them as the spirit moves you. In the meantime, let me briefly describe the digital component of the class as it stands.

Digital workshops will be conducted outside of class. Students will be required to come to group workshops for 1 hour a week, which will alternate between instruction and discussion, and for another 4 hours, the graduate assistants (me and Chris in this case) will provide tech support in the form of office hours at the Scholars Lab. We initially estimate that students will be working on the workshop an average of 3 hours outside of class every week. You can read the schedule for the digital workshop and in essence the digital skills we will be working with here.

The class will be divided into 2 student projects corresponding to the midterm and the final. Project alpha is the pdf of the out-of-print scholarly work. Here is where they learn the production of high-end, well-proofed PDFs of secondary sources. The texts for this exercise were chosen carefully to harmonize with the content of the seminar, and students will be reading from them besides reproducing them. Because the proof will be on the proofing, this exercise also allows for the teaching of matters that have fallen under the purview of traditional textual criticism.

The second project is the production of a digital variorum of a small primary source, in this case of a couple of poems by Edgar Allan Poe. In this half, students will get their first introduction to mark-up in the form of a reduced version of TEI. You may have already seen the original TEI handout that I produced while Annie, Mike and I tested out the digitization model during the summer break. Since the original handout was written at a time when we were envisioning the students working only with secondary sources, we will adapt it for primary sources when the time comes.

After the different instantiations of the poems are marked up properly, the students will then be required to run them through the new version of Juxta which is about to be released with spanking new TEI functionality. We expect to teach students not only how to use the software effectively but also how to draw stemmatic conclusions from the comparisons —once again linking the digital component to the practice of textual scholarship.

So here we are at the University of Virginia, at the beginning of the semester of Prof. McGann’s ENAM4500, American and British poetry in the XIXth Century, and we are about to embark on a fascinating pedagogical/publishing experiment. A few days ago was the first day of class. We have a group of 8 students that we’ve divided into two teams of 4 each. Each team has a leader in charge of coordinating the group’s activities and who responds to us. Each team also has a flash drive to help them coordinate their files. As you can see, next week we begin with the ever-dreaded, but oh-so necessary scanning. Please join us then, as we document the plight of young 20-somethings as they battle against the 300dpi TIFF monster and the gargantuan scanners of the Scholars Lab.

Posted in General | Tagged , , , , , , | 1 Comment

Project Tango

Project Tango has two related goals: first, to design a free, crowd-sourced model for producing online open-access to the print archive of humanities scholarly monographs (university press books, in the first instance); and second, to locate the production process within the domain of the traditional courses in college and university humanities degree programs.

The immediate aim is to make accessible an important corpus of scholarly materials that is still in copyright.  Because the user-community for these works is a special scholarly one, we think they ought to be treated quite differently from books produced and marketed to the general population.  We also think there are clear  advantages to be gained if that special community takes a major practical role in the digital dissemination of these works.

The pedagogical effort is to get undergraduate humanities students and faculty in many college and universities to begin participating actively and collaboratively in an important long-range scholarly research project: the migration of our paper-based cultural inheritance to an integrated online network.  At present this migration is being driven by commercial rather than scholarly or educational agents.  The result is work that either does not meet the needs or standards of scholars and educators; or work that, while well-produced, is proprietary and therefore either quite expensive, and/or more or less access-restricted.

The model is conceived as an opening move for students and faculties in the humanities to begin regular research collaborations.  In this case, the research work would focus in a project of fundamental importance to the future of humanities education.

The production model must meet the following requirements:

1.  The digital copies must be fully string-searchable, they must be web-accessible, and they must share a basic set of standard metadata: author, title, date, source, and we initially think genre, (the last drawn from a small uniform set).

2.  The copies must be proof corrected to scholarly standards for accuracy.

3.  The production model must be simple to implement both administratively and technically.

A test trial of the model will be run in one  undergraduate English course in the fall term of 2010 at U. of Virginia.  It will then be modified as necessary and tested again in several undergraduate course in the spring term.  At that point the model will be evaluated for the next moves.

Various open depositories for the digital works are being considered.

NOTE: The project is named “Tango” because it is imagined as an agreeable dance between the only two parties who have a copyright claim on these works: the university presses and the authors of the works.  The parties are asked to agree to “dance together” and not to step on each other’s toes:  each party is left free to expose the individual monograph online in whatever ways they see fit.

Posted in General | Leave a comment