TEI Handout – Poetry Edition

XML and Books

XML stands for eXtensible Markup Language and as the word markup implies, it is a tool used to describe data. HTML, which you may be more familiar with, shares many similarities with XML, most obviously the use of tags enclosed in angle brackets ( < > ). But while the “data” in HTML consists of instructions to browsers on how to format and present web pages, in XML there is no predefined use or vocabulary for the tags (hence the “eXtensible” in XML). For example, you might use XML to encode data about your pets. If you own a pink Chihuahua named Pepe, you could “express” it in XML this way:

<dog type=”mine” name=”pepe”>
   <color>pink</color>
   <breed>Chihuahua</breed>
</dog>
TIP: For a longer introduction to XML, visit w3Schools.

In a sense, all texts can be said to contain data of a certain kind. Literature and criticism are no exception to this rule. XML helps you name and organize that data. With XML, the possibilities are legion: we could, for example, name the kinds of content we find (<metaphor>, <character>, etc.), describe the physical attributes of a book (<paper>, <ink>, etc.), how a text is laid out (<column>, <page_break>, etc.) or the logical units of a text (<line>, <paragraph>, etc).

Because there are so many possibilities, scholars and scientists all over the world have agreed to use standards in their fields. In digital humanities, the most important standard set of predetermined tags, or tag-set, is the one provided by the Text Encoding Initiative (TEI). In this class we will be using an even smaller subset of that standard called TEI-lite to introduce you to the practice of tagging.

TIP: To deepen your knowledge of TEI and TEI-lite, you can explore TEI by example or read the TEI-lite documentation.

A basic TEI file includes a text (the linguistic content) and meta-data (information about the text). The first three tags you will learn already express the basic structure of a TEI file. The topmost tag includes all other tags and is named <TEI>. The tag which includes the information about the text is called the <teiHeader> and always precedes the tag which includes the text, appropriately named <text>.

The overall structure of the TEI file then looks something like this:

<TEI>
   <teiHeader>[information about the text]</teiHeader>
   <text>[the text itself]</text>
</TEI>

<teiHeader/>

In the template.xml file we provided, you will notice that the tags for the <teiHeader> are already there. All you have to do is provide the information itself. Within our teiHeaders there are two large categories that give us very useful information about a given digital edition:

  • <fileDesc>
  • <encodingDesc>

<fileDesc>
This is where we include information about the text we are encoding. There are three large categories we will be using within the fileDesc element:

<titleStmt>
The title statement refers to the digital work. For the most part, the digital work preserves most of the information from the print text, but adds information about responsibility for the production of the digital version.

IMPORTANT: Within the <titleStmt> you will find the <respStmt>, or responsibility statement. It is very important that you assign yourself the right unique identifier using the @xml:id attribute in the <name> element. Since you will be working in teams, this ID should be unique to you. We recommend you use the first two letters of your first and last name together. Ex: 

<respStmt> 
   <resp>encoded by</resp>
   <name xml:id="PaNe">Pablo Neruda</name>
</respStmt>

Your unique ID will be used in a few situations through out the encoding process, so make a note of it, and make sure you use it consistently when required.

<publicationStmt>
The publication statement refers to the channels through which the digital work will be distributed and stewarded. This section has been filled out for you.

<sourceDesc>
Finally, the source description refers to the original printed text that you are encoding.
In order to complete the <titleStmt> and the <sourceDesc> you must ‘fill in the blank’ whenever you encounter brackets [ ] in the template. For example, if you encounter, <p>[name of your university]</p>, you write, <p>University of Virginia.</p>

<encodingDesc>
The encoding description describes the standards and methods that were used while encoding the text which scholars strive to make explicit. In your case, these norms have been already filled out for you. Now, make sure you follow them!

<text/>

Within the text file, our mini tag-set we will have tags for the following categories:

  • Body
  • Poetry (line groups and lines)
  • Emphasis
  • Your commentary

Front, Body and Back

Most given texts are divided into front, body and back matter. A typical <text> section in a TEI file will also be divided into <front>,<body> and <back>. Here is the general structure of a <text> element:

<text>
   <front></front>
   <body></body>
   <back></back>
</text>

<front>

The front usually includes a title page, a table of contents, and other introductory materials. In many books, the front matter is easily identifiable because the pages are marked with lowercase roman numerals. [We will not use this category in our encoding since we are only coding individual poems]

<body>

The body is where we find the text proper.

<back>

As the name suggests, the back matter comes at the end of the book when the main content can be said to be over. Usual back matter includes an index and a colophon.
Since there is a lot of room for variety, marking sections within these three elements usually takes advantage of the <div> element explained below.

Tip: Read the article on “book design” in wikipedia and see where the links take you.

<titlePage>

The title page of the book is very similar to the teiHeader. In print technology, this is where we would find our basic meta-data, author, title and publisher. [Since we are only encoding the poems individually, we won’t have this section in out TEI file.]

Sections

Sections are organized differently in different texts. This is why the generic tag <div> is used to <div>ide a text into parts. Since the tag is generic, we must give an attribute describing the content it is enclosing. An attribute is written inside a tag using the following syntax:

<element attribute=”value”>

In the case of <div> elements describing sections in a text we often find something like this:

<div type=”chapter” n=”1”>
   <p>It is a truth universally acknowledged, that a single man in 
     possession of a good fortune must be in want of a wife.</p>
   <p>However little known the feelings or views of such a man may be on his
     first entering a neighbourhood, this truth is so well fixed in the minds 
     of the surrounding families, that he is considered the rightful property 
     of some one or other of their daughters.</p>
</div>

Notice that there are two attributes being named inside the tags: type and number. These attributes tell us this particular <div> refers to chapter no.1. Divs are usually embedded within each other to form a hierarchy. For example, a <div type=”part”> may include several <div type=”chapter”>, which in turn include many <div type=”pages”>. Note: Usually every section of a book begins with a header. Headers are marked with the tag <head>. Ex.:<head>Chapter 13</head>.

Paragraphs

As you may have noticed already, there are two types of elements (or tags): Those that nest content,

<some_element>some content</some_element>

and those without content,

<some_element/>

The tags for paragraphs always contain text in between the opening tag <p> and the closing tag, </p>. This text is said to be nested in <p>. The <p> tag is also shared with HTML and along with <div> is perhaps the most common tag out there. Here is the <p> tag in action:

<p>It is a truth universally acknowledged, that a single man
     in possession of a good fortune must be in want of a wife.</p>

In your source, most text can be enclosed by the <p> tag. It is very important that no content exists outside of the hierarchy. Some content, such as poetry, requires a different set of tags.

Note: Some paragraphs are extended quotes from another source. These will usually be marked in the text by separate indentation. In our encoding scheme we will mark these with the <q> tag instead of the <p> tag. This will allow us later to represent these “paragraphs” differently than the rest. An epigraph, for example, would be marked with a <q>.

Poetry and lists

Since you will be working with poetry you should pay special attention to this section. Marking poetry is a bit different than marking prose. It is, in a sense, more akin to marking lists in HTML. The two tags we will use are <lg> and <l>. The <lg> stands for a line group, and the <l> stands for a single line. All the lines must be nested inside the <lg>. The mark-up is straight forward. Here is some very basic markup of a poem:

<lg>
   <l>Shall I compare thee to a summer's day?</l>
   <l>Thou art more lovely and more temperate:</l>
   <l>Rough winds do shake the darling buds of May,</l>
   <l>And summer's lease hath all too short a date:</l>
   <l>Sometime too hot the eye of heaven shines,</l>
   <l>And often is his gold complexion dimm'd;</l>
   <l>And every fair from fair sometime declines,</l>
   <l>By chance or nature's changing course untrimm'd;</l>
   <l>But thy eternal summer shall not fade</l>
   <l>Nor lose possession of that fair thou owest;</l>
   <l>Nor shall Death brag thou wander'st in his shade,</l>
   <l>When in eternal lines to time thou growest:</l>
   <l>So long as men can breathe or eyes can see,</l>
   <l>So long lives this and this gives life to thee. </l>
</lg>

In many cases there are other aspects of the poem which you might consider marking up. We might note line numbers (for easier reference) as well as some statement of the poem’s prosody and form. Here is a more significantly marked up version of the same poem:

<lg type="sonnet" rhyme="abab cdcd efef gg">
   <lg type = "quatrain">   
     <l n="1">Shall I compare thee to a summer's day?</l>
     <l n="2">Thou art more lovely and more temperate:</l>
     <l n="3">Rough winds do shake the darling buds of May,</l>
     <l n="4">And summer's lease hath all too short a date:</l>
   </lg>
   <lg type = "quatrain">   
     <l n="5">Sometime too hot the eye of heaven shines,</l>
     <l n="6">And often is his gold complexion dimm'd;</l>
     <l n="7">And every fair from fair sometime declines,</l>
     <l n="8">By chance or nature's changing course untrimm'd;</l>
   </lg>
   <lg type = "quatrain">   
     <l n="9">But thy eternal summer shall not fade</l>
     <l n="10">Nor lose possession of that fair thou owest;</l>
     <l n="11">Nor shall Death brag thou wander'st in his shade,</l>
     <l n="12">When in eternal lines to time thou growest:</l>
   </lg>
   <lg type = "couplet">   
     <l n="13">So long as men can breathe or eyes can see,</l>
     <l n="14">So long lives this and this gives life to thee. </l>
   </lg>
</lg>

This encoding numbers the individual lines, declares that this is a sonnet and notes the rhyme scheme as an attribute of the poem. It also breaks the quatrains and couplet into separate line groups (note that a line group does not necessarily need to typographically separate, as stanza does; a line group is just… a group of lines). For more examples see TEI by Example’s Poetry Module” which includes a number of interesting examples, including a Shakespearean sonnet that is even more extensively marked up to include information about meter.

The poems that you will be encoding will probably be divided into stanzas. You should mark these using @type=”stanza”, making sure to also mark the @n (number) of the stanza. Notice also that you can mark the @n for each individual line.

Page and line breaks
Most paragraphs or line groups in a TEI file won’t require more tweaking than placing them inside <p> or <lg> tags. Some content you will encounter, though, will require you to mark the presence of a carriage return. This is marked in TEI with the tag <lb/>. Notice this is a tag without content. The use of this tag might not be necessary after all in your particular text, but it is nice to know it’s there.

As opposed to the <lb/>, we use a page break tag, <pb/>, every time we encounter the end of a page. It is also here, in this element, that we usually mark the page number in the form of a @n attribute. The value of the @n should correspond to the following page. This way all <pb> will come at the at the beginning of the page they correspond to. Ex.:

<p>... Walpurgisnacht of the novel is Shrove Tuesday. Mann marks<pb n=”ix”/>the 
curiously timeless passing of time in the magic mountain with feast days...</p>

Because it is a tag without content, it can be placed anywhere in the hierarchy. In this particular place it was placed inside a <p> element. If the page would’ve ended when the paragraph ended, our <pb> tag might as well have been placed after the closing </p> tag:

<p>... Walpurgisnacht of the novel is Shrove Tuesday. Mann marks the 
curiously timeless passing of time in the magic mountain with feast days...</p>
<pb n=”ix”/>
IMPORTANT: Every encoding is itself an interpretation of the text and represents a particular set of choices and priorities. In our case, we are focusing on the text itself and not on its printing, therefore we will not use the headers and footers on each page of the printed text in our encoded version.

Emphasis

One of the most common attributes in TEI is the @rend attribute (notice that the @ symbol precedes attributes when we name them in documentation). The @rend attribute usually names the way a particular text segment is rendered in the source. For our basic tag set we will only use two values: italics and underline. Whenever you encounter words underlined or in italics in the original we express these most of the times by adding the @rend attribute to a <hi> (highlight) element (or directly to the <p> element if all the content in that paragraph is emphasized). It is another convention to use the language of CSS, which is used to describe how HTML pages are to be rendered online, to express the value of @rend. The CSS values for italics and underlined text is “italic” and “underline.”  Here is an example:

<p><hi rend="italic">And this Indenture further witnesseth</hi>
that the said <hi rend="italic">Walter Shandy</hi>, merchant,
in consideration of the said intended marriage ...
</p>

Indentation

The @rend attribute is very useful also to mark other features of the original we could call “bibliographic.” You will notice that written poetry does not communicate its meanings only by its words, but also by the way the words are distributed on the page, by the emphasis on certain words, even by the font used. We can call these extra-linguistic features of a text “bibliographic.” Indentation is one of these features that you will encounter in Emily Dickinson. To mark indentation on a given line we will use the @rend attribute like this:

<l @rend="indent">

Special Characters

As you probably have noticed by now, Emily Dickinson’s poems are frequently edited with an overwhelming amount of em-dashes. This is considered a special character, just as the ones we saw when we were doing the OCR/proofing exercise. If we were to mark these em-dashes using simply the keyboard (Alt+0151) to produce the character itself —, we would run the risk of losing the character in some of our transformations from the XML to HTML or another platform. In order to make sure we can work with special characters in XML files, we must use a Numeric character reference (NCR). NCRs are a series of numbers assigned to all the Unicode characters and symbols that both XML and HTML can understand. These numbers are written between an ampersand (&) and a semicolon.The NCR for the em-dash is 8212, thus we would write it &8212;. Because it would be too much of a pain to remember this number every time you need to type an em-dash, we have “converted” it in your templates to a Character entity reference. The conversion takes place at the top of the TEI file, inside the [!DOCTYPE]. You don’t have to worry about that right now. Suffice it to say that when you need to insert an em-dash you should type:

&mdash;

<!— Making your own commentaries —>

Finally, it is important you know there is a way to be perfectly free to write your own notes on the text you are working with. You will find this is a good way for you to mark trouble spots you want someone else to look at, for you to explain the rationale behind an unusual tagging decision, or even for you to offer your reading of a particular passage if the spirit moves you so! To make a commentary you simply include it within the following characters <!–  –>. Ex:

<p>...yes I said yes I will Yes.<p>
<!-- lol, Molly should really make up her mind! -->
Advertisements

5 Responses to TEI Handout – Poetry Edition

  1. Dawna says:

    It appears u truly fully understand a great deal related to this specific issue and that demonstrates by means of this excellent
    article, given the name “TEI Handout – Poetry Edition | Project Tango” monstruation .
    Thank you ,Lasonya

    • Julissa says:

      "Chanson d'Automne" Paul VERLAIRELes sanglots longsDes violonsDe l&;et#au9omneBl3ssent mon coeurD'une langueurMonotoneTout suffocantEt blême, quandSonne l'heure,Je me souviensDes jours anciensEt je pleureEt je m'en vaisAu vent mauvaisQui m'emporteDeça, delà,Pareil à laFeuille Morte.NB * Je vous souhaite un Bel Automne et ma bibliothèque poésie virtuelle, à moins d'un bug informatique, est toujours ouverte. C'est pour cela Castor, que je n'ai pas fait le jeu n° 143 du J.P.H…

  2. Thank you for this informative and accessible introduction to TEI. I’ve been interested in it for my own work and have found this useful.

  3. , potarł towarzystwo
    zapalniczki, Agencja Internetowa Zaciągnął się,
    wydmuchując reklamy adwords (www.macobserver.com) dym w buzia niziołka.
    Palił camele, oraz
    jak. Milczał poprzez chwilę, Dwóch fałszywych specnazpwców zaciekawiło pozycja po jego
    bokach, dwa pozostali zniknęli, Frodo nie mógł skierować główki, mógł właśnie
    powtarzać zerknąć w bok, załatwiając do tego straszliwego
    zeza.
    – Naszli? – rzucił Kirpiczew w bok, nie spuszczają.

  4. Jonathon says:

    Remarkable! Its in fact amazing paragraph, I have got much
    clear idea on the topic of from this post.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s