Wrapping Up

Some keep the Sabbath…

Like Alex’s team, my group of intrepid students spent the latter part of the semester looking at various versions of a single Emily Dickinson poem; ours was “Some keep the sabbath.” I sent students into the stacks (and onto Google Books) to find various appearances of the poem. In addition to versions appearing in the Norton Anthology of American Literature, the 1890 Poems of Emily Dickinson, the manuscript version in the R. W. Franklin’s Manuscript Books of Emily Dickinson, and versions in both Franklin and Thomas H. Johnson’s editions of Dickinson’s complete poems, we dug up some odd appearances of the poem as well as its first appearance in 1864 in the Round Table (this latter as a digital image). (Dickinsonians have likely already guessed that we chose “Some keep the sabbath” in part because it was one of the few poems which appeared in print in Dickinson’s lifetime.)

We were less assiduous in our stemmatics than what Alex describes. After encoding and comparing 12 versions (of which we ended up inadvertedly encoding the same exact edition & state from two different copies), our version of the textual history went something like this: the 1890 established a base text which was largely hewn to (with some variations in indentation and punctuation) until Johnson’s edition of the Complete Poems, which restored a version truer to the manuscript. The titles under which “Some keep the sabbath” appeared, however, were certainly interesting: “My Sabbath” (in Round Table) and “A Service of Song” (in the 1890 Poems of Emily Dickinson) (and “The Service of Song” and “Sabbath” elsewhere, presumably adapting the Round Table and 1890 texts respectively).

This proved to be an interesting exercise which seemed to induce less “soul crushing” than our previous project. Descriptive markup proved interesting enough to students; even when the markup was wrong (an invented “” tag, for example), the basic thinking was right. The TEI headers proved the most complicated and unpleasant aspect of the process (despite Alex’s heavy annotation of a template file). This was perhaps unsurprising and itself a valuable lesson in in the challenges of uniquely identifying any single “text” or “document” (or encoding thereof).

<hi rend= “tepid”>Success!</hi>

Alex notes the way in which our digital workshops were super-added to an existing, more traditional, literature course, resulting in some surprise (dismay?) among the students enrolled. The “lab” style addition Alex proposes, along the lines of the “lab” requirement for science classes, might offer one such solution (though I recall my physics lab as a not spectacularly useful addition to my knowledge of nature’s properties). My inclination would be to simply weave a digital project into the warp and woof of the course material, or even focus an entire class on such a project (or set of projects).

For example, to stick with the Dickinson example, I began to think that rather than examining the print history of a single poem, the class could have encoded all the poems in the 1890 Poems of Emily Dickinson to the manuscript versions, in order to examine how Dickinson’s initial reception was affected by Higginson and Todd’s crafting of her work. While tracing the variations in the print history of a single poem is a nice task, the results in the case of Dickinson were rather predictable and didn’t offer many footholds for really interesting interpretations to come out of the work. If students encoded two versions (1890 and manuscript) of a handful of individual poems and then shared the results as a class, one could begin to try to understand what informed the passage of Dickinson’s work from manuscript to print. (Alex’s section and mine operated largely independently; I now wonder if more cross-pollination could have achieved some more synthetic thinking.)

I would describe the digital workshops overall as a “tepid success.” We wanted students to use markup and Juxta to think about print history, and they did. I wanted students to understand the complexities of copyright. They did. A certain sense of dissatisfaction nevertheless remains. For silly reasons that I can’t really articulate, I thought that along with a certain set of skills (mark up, collation) and scholarly questions (print history, textual evolution and representation), a digital workshop would spark a process which would not be limited to these (rather traditional) course goals. Somehow, I imagined the self-starter ethos which is so much a part of the attide of the “digital humanities” (to say nothing of the ideology of Silicon Valley[1]) would inspire a degree of independence and exploration in the members of the group such that we would not merely meet the course goals, but do something more.

Whether a differently organized project within the structure of anoter class could better spark this sort of involvement, however, is a question which remains to be answered.

[1] Here, in the comfort of a footnote, I’ll raise a question that has been nagging me for some time: is the “culture of the digital humanities” (a weasly phrase I know) allied to a certain “free market” values? Might these sort of values put it at odds with a discipline like English, where the term “neoliberal” requires either spitting or the actual smiting of one’s own chest in order to be spoken aloud? (This same tension, it seems to me, is present in the “open source”/”free” software community itself—so often invoked as a model for DH—where Eric Raymond’s vision of the “the bazaar” is fundamentally a vision of a free market, rather than the “freedom” advocated by Richard Stallman.)

Posted in Uncategorized | Leave a comment

Wrapping Up

In the best tradition of Digital Humanities projects, the first digital laboratory for project tango was both a failure and a success. For me personally, this was the first time teaching a digital humanities introductory course and in that sense, I learned an enormous amount. There were many things that I think we did right and some others that could definitely be done better. Since I am writing this at the end of the calendar year, what best way to wrap things up than with a list of things to work on:

The unexpected DHer

Probably the biggest setback for us this year was working with students who did not expect to take a DH lab. In this regard, I think that we should push for a formalized approach to undergrad DH instruction. Recently, we started conversations with several interested parties in the English Department and the Library to consider what such a thing would look like. The goal is to have a marker in the course offering directory that would let students know what they are signing up for, also serving as an extra incentive for those who are technologically minded. The model of the physics laboratory seems the most appealing here, and we envision an extra-credit digital lab that would supplement the traditional English department course. Hopefully, we will begin talks with the deanery to see where we can go with this.

Not enough work?

At the end of the semester, I felt that there was too much dead time, that we could’ve pushed the envelope a bit more. When we originally designed the lab, we were cautious not to over-assign since we had never done anything like this before. Looking back, I feel that students were more than capable of working on more modules. As you can read below, we worked on four modules, two for each half of the semester: scanning, OCR/Proofing, mark-up and Juxta analysis. I feel there was room for at least two more modules. A digital lab with six modules, three for each half, I think could give the students a more in-depth look at the world of DH. If the labs are designed to be attached to a particular kind of literary course (and I can imagine a few different models), the six modules can be adapted to those particular needs. The pattern would still stay the same, one day of hack, one day of discussion, and so on and so forth.

Le mot juste

The choice of a primary source at the end of the semester worked out really nice. As sad as it is, I think the original goal of providing a model for undergraduate crowdsourcing fails to meet our pedagogical goals half way. The students learned very little of the text they scanned and OCRed from that exercise. On the other hand, as you can see from the student’s report, they did engage with the Dickinson poem they marked-up and compared using Juxta. Perhaps there are other ways of being able to crowdsource involving undergraduates, but I think the digital lab most be a separate affair. I was perhaps more optimistic at first than Chris on this, but I’m willing to concede at the end now that I see the potential of the digital lab itself.

Concept before practice

At THATCampVA, which followed at the end of the semester, we had a chance to debate some of the issues that came up during the class. In one of those debates, Julie Melonie talked briefly of her own models for DH instructions. One of her main points seemed worth exploring further, and given the opportunity to do this again, I would give it a shot. She suggested that DH instruction revolves around several key concepts that can apply to several distinct practical applications. In other words, there might be something to be said for abstracting what these concepts are and then adapt the applications to the particular needs of a given literary class. Since we were not able to get into detail, this I’m sure will be part of the discussions for 2011.

As I said above, I think our petit experiment was both a failure and a success. As of now, I don’t know if this is the last post on this particular venue. If it is, I thank you for following our conversation so far, and I hope to see you in some of our other outlets.

You can visit Chris on his homepage or follow him on twitter. You can also visit my homepage or follow me on twitter.

Posted in Uncategorized | Leave a comment

Second Project Narrative (Alex’s Team)

For our second project we examined multiple editions of Emily Dickinson’s “Success is Counted Sweetest.” In the project we examined eight different editions of the poem which we ‘marked-up’ for TEI and then compared to each other using software called Juxta. Based on these comparisons we created a stematic diagram showing the progression of the poem through its various editions. In creating our stematic diagram we tried to decide what were the major editorial decisions surrounding the poem. We came to the conclusion that the two major editorial decisions were the use of dashes vs. commas and stanzas vs. no stanzas. While the text itself remains essentially the same throughout all of the versions, these two editorial decisions have a significant impact on how each version of the poem is read and interpreted. Our task in assessing all of these things was to discover the progression from the manuscript to the final published version (the Norton version) and to discern when certain editorial decisions were made.

Looking first at the manuscript (labeled ‘0’ in our stematic diagram and the various critical apparti), Dickinson both used dashes and split the poem into stanzas. As this is the first available work we set it at the top of our stematic diagram and used it as a basis of comparison for the other later editions of the poem. Based on the characteristics of the manuscript we split the seven remaining versions of the poem into four categories, based on the editorial decisions of each version of the poem.

The first version that we looked at was the ‘1967a’ edition (labeled ‘1’). This version also featured both stanzas and dashes, and was nearly identical to the manuscript, differing in only two minor places, both of them minor punctuation differences. In examining the remaining versions of the poem, we saw that the ‘Norton’ edition (labeled ‘1a’) also had the same editorial decisions made in the ‘1967a’ version. From this we concluded that the ‘Norton’ (since it was published after the ‘1967a’ version) was created using the same editorial decisions seen in the ‘1967a’ version.

The second sub-category that we found were three editions which contained stanza’s but no dashes. The first of these editions published was the 1892 edition (labeled ‘2’). This edition contained stanzas, like the manuscript, but no dashes. Hence it clearly represents a different branch of the poem. Along with the 1892 edition, both the 1901 edition (labeled ‘2a’) and the 1930 edition (labeled ‘2b’) also featured stanzas, but no dashes. Based on these conclusions we determined that the decisions made in the 1901 and 1930 editions were based on the 1892 edition of the poem. It is important to note that the 1930 edition may have been based on either the 1892 or 1901 editions, since both of the preceding editions feature the same major editorial decisions. For our stematic diagram, we placed the 1930 edition below the 1901 edition based on the time each was published.

The third sub-category we found was the 1878 edition (labeled ‘3’) which featured no stanzas and no dashes. These editorial decisions are different from those found in the manuscript so we concluded that the 1878 edition represents a third sub-category of the poem.

The remaining edition of the poem the 1967b edition (labeled ‘4’), contains different editorial decisions than those seen in any of the other editions. The 1967b edition is not split into stanzas, but does contain dashes. Given that this is the only edition of the poem featuring this set of editorial decisions we concluded that the 1967b edition represents a fourth sub-category of the poem.

In completing this project, we were able to observe one Emily Dickinson poem undergo many editorial changes, which effected the overall interpretation of the poem. Using juxta to compare versions spanning from the manuscript to the published Norton “Success is counted sweetest,” we saw the way a poem, presented in different forms, is still recognised as the “same” poem. This required us to determine what aspects of a poem make it unique, and what changes can be allowed while still maintaining its essence or meaning. In the end we decided that, because the text remained essentially the same, the exchange of dashes for commas and the choice of breaking the poem into stanzas or not changed the way the poems could be read, but did not create a new poem altogether.

Posted in Uncategorized | 2 Comments

TEI top-down, bottom-up

[post co-authored by Alex Gil and Chris Forster]

Alex: Last week we finally introduced our students to the wonderful world of markup, in our case of the TEI variety. Since I meet with my team on Mondays I got to dip my feet in the cold water first. I’m calling my approach top-down to contrast it with Chris’s approach, which I think we can call bottom-up. I started with the larger structure and then dug my way down to the minutia, eventually arriving at the text itself, while Chris (who will get a chance to tell his side of the story soon enough) started from the text and moved up the TEI.

My approach, if it can be called mine at all, came to me as a natural offshoot of the TEI handout that Chris and I drafted for the class. I opened with the larger theory of mark-up, pointing out that punctuation is an early form of graphical markup and then moving on to the present where mark-up allows our computers to interact more meaningfully with texts. From there I moved to an introduction of TEI through the question of standardization. I wanted my students to understand that TEI is a human agreement on a shared vocabulary more than a precise description of the texts. Then I laid the overall TEI structure on them –I must admit I lost a couple of souls when they saw this:

   <teiHeader>[information about the text]</teiHeader>
   <text>[the text itself]</text>

I’m aware that for students who have not seen XML syntax, the first encounter can elicit a brain freeze of sorts, and I thought that fanning out the hierarchy gradually would move my students from simple to complex. I used oXygen to open the nodes one by one going from high to low in the hierarchy, eventually arriving at the <text>.

Their assignment for next week is simple. They must each take a different version of the same poem by Emily Dickinson, four texts total, and produce a relatively soft TEI encoding of it by Monday. In the end, I think they were relieved to find out we had created a template for them to use. This week we will get our first results from our students, and I will have a better sense of the success or failure of our 1 hour session last week, but after having a chance to see Chris’s approach, I wish I would’ve structured my class a bit differently. I guess I should step aside now, and let him tell it like it is

Chris: Alex’s description of the difference between our approaches as top down and bottom up seems apt. And indeed, my chief interest was trying to help students see the sorts of things that we would be interested in marking up, and what the potential advantages of such markup might be. The long, strange history of markup—the arcane passage from SGML to HTML and its various versions; XML; the misadventure of XHTML and the excitement surrounding HTML5, etc. (for a nice review of this history I recommend this wonderfully brisk Brief History of Markup)—is certainly important, but it can be an overwhelming introduction to a process that I like to simply reduce to “description.” So instead of reviewing TEI and its history, I was interested in introducing students simultaneously to the practice of markup (simplistic reduction: angle brackets) and to the process of thinking about what should be marked up (simplistic reduction: … I don’t think there is one).

To this end I appealed to an example that at this point seems like something of a cliché in discussions of markup: the recipe (I drew inspiration from this page in particular, but one finds similar examples all over the place). I presented students with a recipe and asked them to “mark it up”; what types of information does it contain? How would you describe them? How would you divide it up? Students got the general structure pretty well; recipes have titles, descriptions, lists of ingredients, and then a set of instructions. They nicely noted the distinction between the unordered list of ingredients and the ordered list of instructions (they were thinking in HTML terms here; some of my team has had experience with HTML). They did not catch the distinction between amount of an ingedient (which itself includes the question of what standard of measure we’re using) and the types of ingedient; but that was a helpful opportunity to show a level of markup a little deeper than just basic document structure (inline, rather than block, elements we might say).

I tried to stress what advantages the marked up version of this recipe would have over the plaintext version they had started with. If one had a database of recipes marked up in this way, you could automatically generate a shopping list; you could query your pantry to see what recipes you can make based on the ingredients you have; you could easily double a recipe to make more; and so on. We had already looked at a poem and talked about how we mark it up the previous week, so the transition from describing a recipe to a poem was not as abrupt as it may seem (especially thanks to the great examples at TEI by Example). If we were interested in a poet’s use of rhyme, for example, we might mark rhyme with a high degree of precision, with an eye towards eventually analyzing rhyme data.

This is not an entirely uncontroversial way of explaining things. My focus on what you can do with marked up text appeals to a different set of values than the sheer, rigorous, scholarly description of a textual object. In part this betrays my own inborn tendency towards text analysis; but I think it has pedagogical benefits as well. It helps to illustrate some of the possibilities of markup, and the way in which markup is never (and here, I know, I’m getting a little polemical) simply a matter of objective description.

Posted in Digital Workshop | Leave a comment

First Student Response (Team Alex)

[As part of their scanning and OCR/Proofing exercise our students were asked to write a midterm report. They have given us permission to post their reports here under our CC license. The following is the report from team Alex, word for word]

ENAM Project Tango, 1st Project Report

David McNerney, Laura Borgs, Jordan Bolden and Flora Pulce
4 October 2010

1. Please describe what you did in the process of digitizing this work. What challenges arose in the process and how did you solve them?

In the process of digitizing the work we went through various stages in converting the text to a digital copy. The first of these stages was the scanning phase. In the scanning phase we assigned each person in the group an equal number of pages to scan, (taking into account the cover, back cover, spine etc.) The scanning process was a relatively simple one, taking each member of the group approximately half an hour to scan their individual pages, and we easily finished the entirety of our scanning in one week. After scanning we moved into the second phase of digitizing the text, the OCR. We spent almost three whole weeks completing the OCR process. The first step in the OCR was to convert our scanned images of the text into a readable .pdf document. We did this using the ABBY Finereader 10 software program. In order to evenly divide the work each group member was responsible for the same pages he or she was responsible for in the scanning phase. This led us to create 4 separate Finereader Documents with one group member responsible for each document. From here we moved into the editing phase, which was easily the most difficult part of the project. Throughout the editing phase we had to use ABBY Finereader 10 to find and edit any mistakes made in the scanning process. While the software was extremely accurate in converting the image to text, there were a number of mistakes which made the editing job a very tedious one. One of the biggest issues arose with learning to the use the OCR software. As undergraduates we did not have previous exposure to any form of OCR software and learning to use the Finereader 10 software proved to be difficult. However, the OCR handout was very helpful and eventually the group was able to collectively figure out the software and make the necessary edits. Within the editing process itself there arose patterns of errors from the OCR software. One of the biggest issues is with hyphens, especially when a word could not fit on the line. It was explained to us why this issue exists, yet it still proved to be a very tedious part of the editing process. Once we finished with the editing all that was left was to compile the complete text into one large document, something that once again was a new thing, but did not prove too difficult to figure out. In summation, the biggest problem we encountered was a general unfamiliarity with the use of OCR software which led to a few minor problems, but all-in-all the process was relatively easy, if not tedious.

2. What are the advantages of a digital edition of a work such as the one you digitized? What are the disadvantages? Include in your reflections not only a consideration of the differences between a paper edition and a digital edition of the same work, but some consideration of how the broader landscape of digital databases (e.g. Google Books, JSTOR) changes how these works are used.

Nowadays more and more book are being put out-of-print. The world of literature is becoming more and more digitalized and “Digital Humanities” is a new field of study developing out of this digitalization, because before looking for a book in a library, people look up the book on the internet first.

The advantages are obvious: finding a book on an online database gives you access to a book quicker and easier that searching it in a library or even having to use the Inter-library-loan which many libraries offer but it takes away valuable time you could rather be spending doing more research. Especially books that are out of print should be digitalized since getting access to these kinds of resources is getting harder and harder. Another advantage is that you can search through the PDF file easily and find the paragraphs you are looking for faster.

Taken to an extreme, book could become a dead medium and libraries won’t have any value some day. By using a PDF file to search a book for valuable information for your research one might miss other important background knowledge which one could have gained by reading through the book diagonally to find the fitting information for the research.

3. What did you learn? In answering this question try to consider not simply the specific technologies involved (I learned how to use a scanner; I learned how to use ABBYY Finereader, version 10), but the broader issues: what did you learn about books?

For question three we decided it would make more sense if each of us answered the question as it applied to us in order to address the diversity of what we all learned. What follows is each of our responses.

David: In completing this first project I really learned more about the effort that goes into digitizing a scholarly manuscript. Although it would be a quicker process if we didn’t attempt to preserve the look of the original text, through the project I have gained a greater appreciation for this standard. Going into the project I did not possesses a true appreciation for putting up a digitized copy of the book as close to the original as possible. In directly working with the text, and digitizing it myself I definitely feel that I learned a new appreciation for keeping books in their original form (or as close to it as possible) rather than just copying the text into a word document. Furthermore, through the project I learned a greater appreciation for digital books in general. Prior to this project I much preferred working with a hard copy of the text, but in creating a digitized copy I grew to appreciate that specific form of media, and the advantages that it provides for, much more. I feel that this project provides a very good way to increase awareness around the issues of digitizing texts, while at the same time providing a good insight into the value of having a digitized text.

Jordan: Honestly, before this project I had never seriously considered the concept of digital humanities or the relevance of the rising field of study to my life beyond my vehement disdain for Kindles vague condescendence for blogging. I believed in the physical experience reading provides; the few seconds of suspense turning a page creates that are completely absent in digital replications. It would have taken nothing short of working within the field in this way to convince me that it is worth it to even pose the question of the significance of digital humanities. But, thrust into the middle of this experiment, I begin to see how monumental the question of digital humanities could really be.

This project forced me to re-evaluate my definition of a book, and consider whether the text separate of a physical book should be defined differently. This inclination of thought is completely new to me, and opened new avenues with which to regard my studies. Learning to use and working with ABBYY Finereader showed me how much work goes into digitising a book, and the way in which a work can be illuminated by doing so. I may not be sold on the idea that a digitised version of a book is transformed into a new thing altogether, but I can see how useful digitising can be as a tool for preserving literature, making it more widely accessible, and providing a more convenient and effective way to search and use texts for research or study.

Laura: Before Professor McGann’s I had never heard about Digital Humanities before. Literature and World Wide Web were always two different worlds for me. Of course I searched for books online but just in order to later on find them in a library nearby to flip through the book and gain more knowledge out of it than only the knowledge you were looking for in the first place.

While digitizing a book I learned to appreciate online databases for the work that is put in the digitization of a book and the fact that online databases are making book accessible for everyone owning a computer or having access to a computer. And to be honest, IT will develop faster and with more attention of today’s society than books will. Taken to an extreme, someday there wont be anymore people writing a book, because with electronic devices like the Kindle the book itself with its pages, front and back cover and a spine will lose its function – books will be downloaded and immediately go on you electronic device – books will become ebooks. And you can already buy about without even holding it in your own hands, flipping through the pages and acknowledging the work that was put into these 300 plus pages because now it is down to a Megabyte size.

Flora: This project was a revolutionary idea, it will be helpful for researchers in the humanities, it is an another effort to bring together the traditional ways of passing knowledge-through books- and a more modern approach of learning and share academic knowledge-the wonderful tool that is internet. But it was not really interesting to be a part of it in a practical way because basically we just read instructions from a blog and reproduced it. We did not really face hard situations, or problems, so we had little initiative. I guess in a way it is a good thing, it proved that everything happened according to plans. But still, all of the scanning and OCR, and correction were really mechanical and I felt it had not a lot in common with being in the humanities, and learning literature. We just skimmed through the book without taking a closer look at it. Though I spent almost a month using it, I have barely no idea of what it is about. I am not, in general very interested in digital editions, I am old fashion, I need a book to feel that I am learning, so in my opinion, though a digital edition is much more useful and easy to work with, it will not replace a paper edition. To sum it up, it is a job that has to be done, but it is not an interesting one.

Posted in Students | 1 Comment

Teaching Materiality and Crushing Souls

We’re nearing completion of the first of our two digital workshops: scanning and OCR’ing a scholarly monograph.

As Alex noted, the process of scanning itself seemed to go pretty smoothly. With the work shared by four people, it was not particularly onerous. Students in my group, unlike those in Alex’s, did not work in pairs; but they had no complaints about scanning.

In introducing and discussing the project I tried to bring our group’s attention to the broader issues involved. What, I asked them, about this book would you be interested in preserving? How would you describe this book?

They skipped over what I assumed would be the obvious answers (the title, the text, the author) and began immediately talking about the typography and the binding. While their vocabulary for describing these properties of the book was limited (as, to a hardcore bibliographer, rather basic, is my own).

I encouraged them to think about how search changes the way we can use a book. I also began to point to the possibilities of large-scale text databases. How does the availability of such resources change the sort of questions we are inclined to ask of the works we read?

(I left off pressing them on questions about what metadata should be attached to a digital edition; that we’ll address soon enough I think.)

While I didn’t draw the comparison explicitly for my students, I see this focus on materiality, on how the book as a physical object mediates the information it communicates, as consistent with what students are doing in their seminar meetings, where much of their time is spent reciting poetry. Prof. McGann has written about the value and importance of recitation as an alternative or supplement for that most traditional of English major seminar activities: interpretation. Focusing on the material properties of the book, I like to think, stems from a similar set of theoretical concerns: about how material form mediates meaning and purpose.

So far, so good.

The process of OCR’ing the book, however, proved challenging in other ways. Alex and I remain frustrated by the problem of how to handle soft hyphens. And the process itself is, by its very nature, more vexing and less pleasant than scanning: while most people were able to scan their pages within an hour or so, the process of OCR’ing took around four hours on average. And those four hours are spent squinting at a screen, comparing a PDF to the recognized text to see whether that period came through as a period or a comma. One student described the experience as “mind numbing.” Another, no doubt with some slight exaggeration, described it as “soul crushing.”

This complaint alone does not necessarily count as a demerit to the project. There is not a priori a reason why this sort of time consuming, “mind numbing,” labor might not have educational value. At least some experience of how the sausage gets made seems undeniably valuable. Once you’ve digitized a text, you understand the world of digitized texts in a different way.

And yet I wonder if the decreasing marginal educational value of each additional hour spent doing the OCR can justify the labor involved. Reciting Whitman’s Song of Myself has certain educational benefits; reciting Dickinson’s verse aloud only compounds those benefits. Scanning and OCR’ing 10 pages of text has certain educational benefits as well. I’m less sure that scanning and OCR’ing the next 10 pages, however, carries the same value.

Posted in Digital Workshop | Tagged , , | 2 Comments

Scanning Workshop

This week we wrapped up the scanning for the digital workshop. Overall the exercise was a great success. Our fears that the scanning was going to be too dreadful an assignment because of its mechanical nature were abated by the total time it took students to complete it. My team took an average of 30 minutes each to complete the assignment.

Teams worked in pairs. We planned it this way so that they could help each other if anything arose before they had recourse to me or Chris. In the end we were right. Although I was available in case they needed any help, they completed the assignment without it. Another added benefit they reported was that keeping each other company made the time go faster and the boredom factor was significantly reduced.

On Monday we met to wrap things up. This was our first opportunity to have a class discussion about the larger picture, and it was where I had a chance to introduce them to the idea (and blog) for Project Tango.

First order of business was to go over and consolidate the files. The files were very good, and the students had followed the naming instructions so well that consolidating them just amounted to dropping the files on a folder (named TIF after the file types) and correcting their orientation. Next week when they begin the OCR process, they will only have to drag and drop these files into the OCR software.

Next, we talked about the larger context of books and essays online. I started them off with Google Scholar and from them pointed to some JStor pdf image/texts. I showed them the advantages of having the capacity to search for quotes or keywords, and we talked about questions of accessibility. From there, it was an easy jump to issues of copyright. I explained to them where we are now in terms of copyright after the advent of the internet and we talked about the Google Books controversy.

From there, we moved on to talk about Project Tango and about the role of students in creating a viable alternative to the proprietary hurdles that come with the Google territory. It became clear right away that students would like to see this sort of workshop count as an extra-credit —pretty much in the same way that a Phys-lab adds an extra credit to your transcript. At this stage the Tango team is in discussions to see how best to push this agenda with the deanery. The goal would be twofold: a) To provide that extra-credit for students, and b) To advertise the class as having a digital component in the course offering directory. Classes in the humanities with a digital lab could have some postfix such as –d, and once students become savvy to the advantages of having the skill-set that the labs can provide, this would be a further incentive to sign up for these classes.

We wrapped up discussion with a preview of next week’s OCR session. I basically insisted on the idea that OCR should be tied to proofing and how the quality of our work in the next two weeks will determine the quality of the digital editions we are making.

Since at this stage the goal of this blog is to provide documentation for our pedagogical/publishing experiment, I took the opportunity at the end of discussion to invite students to comment on the blog entries and materials. Right now we are discussing how to encourage them to do it more often by making it part of their grade for the workshop. I have no doubt that soon enough we will start hearing their voices, critical and otherwise. Stay tuned.

Posted in Digital Workshop | 2 Comments

Undergrads and Markup: How & Why I Got Involved

I became involved with this project in the sort of roundabout way that seems increasingly common as scholars head online. I happened to see Alex announce over twitter a Google Doc, designed for using TEI in the undergrad classroom. Later that week, when I saw Alex at a Scholars’ Lab get-together, I happened to mention how much I liked the idea. And, with that, Alex invited me to join a project that was already well underway.

I was primed to take note of this announcement. For the past few months I had been trying to imagine how one could bring something like TEI markup into the undergraduate literature classroom and to what effect. I had been prompted to begin thinking about bringing this sort of “research” into the undergrad classroom after seeing Gregory Crane speak at the Shape of Things to Come Conference last semester. (Prof. Crane’s remarks as well as the rest of the conference proceedings are available online.) Could undergraduates really do something other than be lectured at, or (in those moments where academics like to think of themselves as especially forward thinking) participate in some sort of student-centered Socratic discussion? Is there a humanities equivalent of the physics lab?

I was not immediately won over by the idea. While Crane spoke with conviction about undergraduates doing translation from classical languages, I wondered whether someone working in early-twentieth century literature (like myself) could really benefit from this approach. An undergraduate taking Latin could profitably translate some untranslated texts. But what “value” can undergraduates add to English language texts?

One answer, I think, is markup. Even a great site like the The Modernist Journals Project could benefit from some markup. Perhaps, I mused, students could compare the appearance of a poem by Ezra Pound in Blast (to take an immediately interesting example, look at “Fratres Minors”; what I wonder do those thick black lines cover?) and compare it to another version. In effect students would be creating mini-critical editions of individual poems. Or, perhaps, students could research and provide valuable annotations for the pages full of obscure references in Blast.

These, though, were merely the idle thoughts of an idle graduate student until I bumped into Alex, offering the opportunity to see how a project like this might work in practice.

I remain curious about how successful this sort of project will be in practice. When ENAM4500 had its first class meeting I tried to offer the students a brief pitch for why this experiment was worth participating in: it would provide English majors with some basic digital skills that might valuable to them (I had this article in the back of my mind); it would also expose students to basic textual scholarship, allowing them to see how the “sausage is made”—how a poem goes from the manuscript to obscure journal and, finally, to the pages of Norton.

I’m eager to see in the coming weeks how well I can deliver on that promise.

Posted in General | Tagged , | Leave a comment

Welcome to Project Tango

Welcome to the Project Tango blog. At the moment the project is in its infant stages, so bear with us.

Tango began as a series of conversations between the NINES group and other parties around the issue of accessibility to out-of-print scholarly works copyrighted before the advent of the internet, but also about the future of books in general. The name for the project came from a conversation between Jerome McGann and Madelyn Wessel, our resident copyright expert. Publishers and scholars most learn how to tango together, quipped Wessel and the rest brings us here.

I joined NINES in the summer as one of their fellows along with Annie Swafford and Michael Pickard and we were immediately recruited to the Tango project. At the time, Jerome McGann, Andrew Stauffer and Dana Wheeles (@bluesaepe) were in the thick of brainstorming adequate solutions for these out-of-print scholarly works around the usual suspects: production, stewardship and copyright. In the absence of an umbrella institution that could coordinate these issues, the main question was how to resolve the problems in a way that would not depend on such an institution, but that would still revolve around a collectivity. What you see here is the result of our continued conversations and we offer them to the public with a healthy dose of both skepticism and drive. We encourage you to join our conversation.

Storage and Stewardship: The idea is for these texts to be made available to anyone with internet access, free of charge and in their integrity —no need to replicate Google Books, after all. There are many possibilities here, and we are exploring most of them fearlessly. There’s HathiTrust, various University repositories, Open Library, an open subset of JStor, et al. This stage of the project we feel safe to punt on for now, because we must make sure that copyright and production can be taken care of before we go too far out to sea.

Copyright: To rescue these texts from the copyright limbo they inhabit, made evident by the Google Books affair, we would send a standardized letter to authors with a template attached meant for the publishers. Because contracts signed before the Internet had no proviso for digital publication, we figured a simple addendum in the form of a written agreement between author and publisher would suffice. The allure for the authors is immediate, since their often neglected works would once more reach their community of interest.

For publishers it may not be a bad deal either, since we would add value in the form of proofing and metadata, while reserving for them any profits to be made —from say a print-on-demand venture— by adding an exception to the typical creative commons attribution, share-alike, non-commercial license. Our target publishers would be small-to-medium sized presses that need help transitioning to the web. In the beginning, the selection of works necessarily have to come from the scholars themselves, prioritizing in a sense the rescue of works that are pertinent to scholarship today. Nevertheless, we imagine the possibility of a clearinghouse for pre-established lists. Because we are talking about atomized copyright retrieval, and as you will soon see, atomized production, in the end this would only work if the model we are offering spreads across the land, and if that monadic work is joined at the hip to a collectivity of the sort that NINES exemplifies. Which brings me to the most exciting and perhaps the most daring part of the project.


The product so far is simple and familiar: A PDF image with well-proofed, searchable text behind it and companion RDF. Mass production, not that simple. In order to generate enough PDFs of out-of-print works to make the model an efficient way to publish in the long run, the solution had to involve crowd-sourcing of some sort, and this is where McGann had one of those moments he has. We could kill two birds with one stone, he suggested, by using the model to teach undergraduate English majors about digital humanities and textual criticism. I must confess he had me at “teach undergraduate English majors the basics of digital humanities and textual criticism.” And to put a cherry on top, he added, the model should also involve a graduate student (or two) to run digital workshops, effectively involving all levels of the academic food-chain in the larger project of digital humanities. While the professor can go about his business, teaching lit as best suits his fancy, the teaching assistant(s) in the workshops can focus on DH related to the materials. (Jerry’s class is pursuing interpretational method via a performative, recitation-based approach to the study of poetic form and meaning). I like to think of the workshops as PHYS lab for lit-heads, minus the 1 extra credit-hour.

As we talked more and more about the pedagogical model, and in no small part thanks to the insight of Bethany Nowviskie, (@nowviskie) we saw that the production of PDFs would have to make room for the digitization, mark-up and digital analysis of primary sources. With these added techniques the students could get a more complete picture of what it is that we do around here, while broadening their understanding of the readings. If done right, we are offering the next generation an early start on the ongoing mass migration of paper to bytes that another generation of scholars begun in the 90s.

We are launching our alpha model in Prof. McGann’s ENAM 4500, English and American Poetry of the Nineteenth Century class this semester. About a week ago, Chris Forster (@cforster) joined the team, and together we will be running the said workshops and writing this blog to serve as both documentation and testament to this adventure. You should be hearing from him soon enough. Together with the blog entries, we will post all materials and tutorials used for the class under a Creative Commons license so feel free to use them as the spirit moves you. In the meantime, let me briefly describe the digital component of the class as it stands.

Digital workshops will be conducted outside of class. Students will be required to come to group workshops for 1 hour a week, which will alternate between instruction and discussion, and for another 4 hours, the graduate assistants (me and Chris in this case) will provide tech support in the form of office hours at the Scholars Lab. We initially estimate that students will be working on the workshop an average of 3 hours outside of class every week. You can read the schedule for the digital workshop and in essence the digital skills we will be working with here.

The class will be divided into 2 student projects corresponding to the midterm and the final. Project alpha is the pdf of the out-of-print scholarly work. Here is where they learn the production of high-end, well-proofed PDFs of secondary sources. The texts for this exercise were chosen carefully to harmonize with the content of the seminar, and students will be reading from them besides reproducing them. Because the proof will be on the proofing, this exercise also allows for the teaching of matters that have fallen under the purview of traditional textual criticism.

The second project is the production of a digital variorum of a small primary source, in this case of a couple of poems by Edgar Allan Poe. In this half, students will get their first introduction to mark-up in the form of a reduced version of TEI. You may have already seen the original TEI handout that I produced while Annie, Mike and I tested out the digitization model during the summer break. Since the original handout was written at a time when we were envisioning the students working only with secondary sources, we will adapt it for primary sources when the time comes.

After the different instantiations of the poems are marked up properly, the students will then be required to run them through the new version of Juxta which is about to be released with spanking new TEI functionality. We expect to teach students not only how to use the software effectively but also how to draw stemmatic conclusions from the comparisons —once again linking the digital component to the practice of textual scholarship.

So here we are at the University of Virginia, at the beginning of the semester of Prof. McGann’s ENAM4500, American and British poetry in the XIXth Century, and we are about to embark on a fascinating pedagogical/publishing experiment. A few days ago was the first day of class. We have a group of 8 students that we’ve divided into two teams of 4 each. Each team has a leader in charge of coordinating the group’s activities and who responds to us. Each team also has a flash drive to help them coordinate their files. As you can see, next week we begin with the ever-dreaded, but oh-so necessary scanning. Please join us then, as we document the plight of young 20-somethings as they battle against the 300dpi TIFF monster and the gargantuan scanners of the Scholars Lab.

Posted in General | Tagged , , , , , , | 1 Comment

Project Tango

Project Tango has two related goals: first, to design a free, crowd-sourced model for producing online open-access to the print archive of humanities scholarly monographs (university press books, in the first instance); and second, to locate the production process within the domain of the traditional courses in college and university humanities degree programs.

The immediate aim is to make accessible an important corpus of scholarly materials that is still in copyright.  Because the user-community for these works is a special scholarly one, we think they ought to be treated quite differently from books produced and marketed to the general population.  We also think there are clear  advantages to be gained if that special community takes a major practical role in the digital dissemination of these works.

The pedagogical effort is to get undergraduate humanities students and faculty in many college and universities to begin participating actively and collaboratively in an important long-range scholarly research project: the migration of our paper-based cultural inheritance to an integrated online network.  At present this migration is being driven by commercial rather than scholarly or educational agents.  The result is work that either does not meet the needs or standards of scholars and educators; or work that, while well-produced, is proprietary and therefore either quite expensive, and/or more or less access-restricted.

The model is conceived as an opening move for students and faculties in the humanities to begin regular research collaborations.  In this case, the research work would focus in a project of fundamental importance to the future of humanities education.

The production model must meet the following requirements:

1.  The digital copies must be fully string-searchable, they must be web-accessible, and they must share a basic set of standard metadata: author, title, date, source, and we initially think genre, (the last drawn from a small uniform set).

2.  The copies must be proof corrected to scholarly standards for accuracy.

3.  The production model must be simple to implement both administratively and technically.

A test trial of the model will be run in one  undergraduate English course in the fall term of 2010 at U. of Virginia.  It will then be modified as necessary and tested again in several undergraduate course in the spring term.  At that point the model will be evaluated for the next moves.

Various open depositories for the digital works are being considered.

NOTE: The project is named “Tango” because it is imagined as an agreeable dance between the only two parties who have a copyright claim on these works: the university presses and the authors of the works.  The parties are asked to agree to “dance together” and not to step on each other’s toes:  each party is left free to expose the individual monograph online in whatever ways they see fit.

Posted in General | Leave a comment