Text only | Skip links
Skip links||IT Services, University of Oxford

1. Where do you get a schema ?

  • Different projects have different requirements... but also overlap
  • and there will always be some unexplored areas (that's what research is for)
  • which is why the TEI is designed the way it is...

2. The TEI architecture to the rescue

The TEI offers you a semi-automatic procedure for selecting from hundreds of markup specifications. You can:

  • Just choose everything... (probably not a very good idea)
  • Work with one of the predefined selections (TEI Lite, TEI Bare...)
  • Roll your own, according to the specific needs of your project
Roma (or Byzantium)
an online tool which makes this task easier for top-down people
odd by example
another tool which makes this task easier for bottom-up people

We'll try both...

3. For example ...

TEI out of the box is designed to work with traditionally organised books and manuscripts. But suppose we want to work on a slightly different kind of object... a postcard collection, or a monumental inscription? How do we make a TEI schema capable of handling hundreds or thousands of things like this:

4. A postcard (front)

5. A postcard (back)

6. Another postcard

Not all cards are organized the same way...

7. Which are the most significant components of these texts?

  • the picture
  • the postmark
  • the printed part
  • the message(s) written on them
  • the addressee(s)
  • subject matter of the picture
  • information about the publishing, printing, circulation of the card or other metadata...

8. Suggestion

We could begin by structuring the card as divisions of various types

Physically:
  • recto: one side, usually the one with the picture
  • verso: the other side, usually the one with the message
On these two surfaces, we expect to find various other subsections, such as:
  • the message
  • information about the sending of the card, notably:
    • the addressee
    • the postmark, stamp, etc.
  • data about the publication, sale, collection, etc. of this card

9. First try at encoding a postcard

<carte n="0010">
<recto url="cartes/19800726_001r.jpg"/>
<verso url="cartes/19800726_001v.jpg">
<obliteration>
<date>PM ?? Jul ???</date>
<lieu>EL PASO. TX 799</lieu>
</obliteration>
<message>
<p>26 juill 80</p>
<p>Chère Madame, après New-York et Washington dont le gigantisme m'a beaucoup séduite, nous avons commencé notre conquête de l'Ouest par New Orleans, ville folle en fête perpétuelle. Il fait une chaleur torride au Texas mais le coca-cola permet de résister – l'Amérique m'enchante ! Bientôt, le grand Canyon, le Colorado et San Francisco... En espérant que vous passez de bonnes vacances, affectueusement </p>
<p> Sylvie </p><p>François. </p>
</message>
<destinataire>
Madame Lefrère
4, allée George Rouault
75020 Paris
France
</destinataire>
</verso>
</carte>

10. Commentary

  • We didn't use the TEI vocabulary. This means we may have trouble sharing or explaining our data with non-french speakers. Or benefitting from their work.
  • We haven't included all the things that might be encoded: for example, corrections in the text, layout of the components, names of people or places referred to, linguistic or historical features, bibliographic data about where the card was printed ...
  • We haven't structured (for example) the address, which will make intelligent searching difficult.
  • Of course, we can always invent more tags for these things. But isn't it rather a waste of our time if the TEI has already done the job ?

11. TEI version

  • We regard each card as a <text> containing two <div> elements, one for the recto and one for the verso of each card.
  • We markup each functional division of the card as a <div type="[function]"
  • Metadata about the published card will go in a <bibl> in the TEI Header
  • We markup names of people and places with <name> and dates with <date>
  • We use the attribute facs to associate parts of transcribed text with their digital image, indicated by a <graphic> element.
  • We use <address> for the address; <stamp> element for stamps, postmarks, and similar things.
  • We may also need <del> (for deletions), <add> (for additions), <reg> (for regularized spellings), <unclear> for things we cannot read, <lb> for line breaks ...

Will that be enough ?

12. First try at a TEI version : the header

<teiHeader>
 <fileDesc>
  <titleStmt>
   <title>San Antonio River : digital edition of card 19800726_001 from the Virgolos collection</title>
  </titleStmt>
  <publicationStmt>
   <p>Demonstration at DH OXSS 2013</p>
  </publicationStmt>
  <sourceDesc>
   <bibl>
    <title level="m">San Antonio River (postcard)</title>
    <publisher>School Mart</publisher>
    <pubPlace>1812 South Press, San Antonio, Texas 70210</pubPlace>
    <idno>SA-146-C</idno>
    <note resp="#ed">The San Antonio river, often called the Venice of Texas, winds its way through the business section of San Antonio. It is very picturesque with its many bridges and beautifully landscaped banks.</note>
   </bibl>
  </sourceDesc>
 </fileDesc>
</teiHeader>

13. First try at a TEI version : the text

<text>
 <body>
  <div type="recto">
   <figure>
    <graphic
      url="../../Graphics/Cartes/19800726_001r.jpg"/>

    <figDesc>View of a stream with a stone bridge and little mexican-style houses. In the foreground a man and a woman are riding in an open boat.</figDesc>
    <head>San Antonio River</head>
   </figure>
  </div>
  <div facs="19800726_001v.jpgtype="verso">
   <div type="message">
<!-- ... -->
   </div>
   <div type="destination">
    <p>
<!-- stamps -->
    </p>
    <p>
     <address>
<!-- ... -->
     </address>
    </p>
   </div>
  </div>
 </body>
</text>

14. First try at a TEI version : the message

<div type="messagexml:lang="fr">
 <p>
  <date when="1980-07-26">26 juill 80</date>
 </p>
 <p>Chère Madame, après New-York et Washington dont le gigantisme m'a beaucoup séduite, nous avons commencé notre conquête de l'Ouest par New Orleans, ville folle en fête perpétuelle. Il fait une chaleur torride au Texas mais le coca-cola permet de résister – l'Amérique m'enchante ! Bientôt, le grand Canyon, le Colorado et San Francisco... </p>
 <p> En espérant que vous passez de bonnes vacances, affectueusement. </p>
 <signed>Sylvie </signed>
 <signed>François </signed>
</div>

15. First try at a TEI version : the destination

<div type="destination">
 <ab>
  <stamp type="postmark">
   <placeName>El Paso</placeName> - TX 799 -<date notBefore="1980-07-26">
    <unclear>PM JUL</unclear>
   </date>
  </stamp>
  <stamp type="postage">Profil masculin, avec un avion et un radar au second
     plan: <mentioned>US Airmail 21 c.</mentioned>
  </stamp>
 </ab>
 <ab>
  <address>
   <addrLine>Madame <name>Lefrère</name>
   </addrLine>
   <addrLine>4, allée George Rouault</addrLine>
   <addrLine>75020 Paris</addrLine>
   <addrLine>France</addrLine>
  </address>
 </ab>
</div>

16. Why use TEI (or any other common framework)

  • re-usability and repurposing of resources
  • modular software development
  • lower training costs
  • ‘frequently answered questions’ — common technical solutions for different application areas

The TEI was designed to support multiple views of the same resource. The TEI is an evolving model of the concerns of Digital Humanities.

17. A word on TEI Conformance

A document is TEI Conformant if and only if it:
  • is a well-formed XML document
  • can be validated against a TEI Schema, that is, a schema derived from the TEI Guidelines
  • conforms to the TEI Abstract Model
  • uses the TEI Namespace (and other namespaces where relevant) correctly
  • is documented by means of a TEI Conformant specification (an ODD file) which refers to the TEI Guidelines

Standardization should not mean ‘Do what I do’, but rather ‘Explain what you do in terms I can understand’

18. For the TEI, that explanation is an ODD

  • Experimenting in this way, we can develop a vocabulary which is specific to our project
  • but which is also understandable outside our project
  • because it is defined and documented according to predefined standards

That's what an ODD is (One Document Does it all)

19. How do you define an ODD ?

You need to
  • decide on the elements and attributes you need
  • specify their content and values
  • document any special usage rules or restrictions

ODD allows you to do this by recycling/modifying the existing TEI definitions.

20. An ODD for postcards

<schemaSpec ident="teiCardsdocLang="enstart="TEI teiCorpus">
 <moduleRef key="tei"/>
 <moduleRef
   key="textstructure"
   include="TEI body dateline div postscript salute signed text"/>

 <moduleRef
   key="core"
   include="add addrLine address bibl date del foreign graphic head hi item lb list name p publisher q reg resp respStmt street teiCorpus title unclear"/>

 <moduleRef
   key="header"
   include="teiHeader fileDesc titleStmt publicationStmt sourceDesc"/>

 <moduleRef key="figuresinclude="figure figDesc"/>
 <moduleRef key="msdescriptioninclude="stamp"/>
 <moduleRef key="transcrinclude="att.global.facs"/>
 <moduleRef key="namesdatesinclude="persName placeName"/>
 <elementRef key="ab"/>
<!-- ... -->
</schemaSpec>

This is just the formal part of the ODD, which defines the schema. The rest of the ODD provides human-readable documentation...

21. ODD processing

An ODD can be transformed to generate
  • complete documentation for your TEI application in a variety of formats (HTML, DOCX, PDF...)
  • one or more schemas (a formal grammar) which can be used to validate your TEI XML files
The TEI Stylesheets include XSLT to do both of these things: these are useable
  • on the web using Roma or Byzantium
  • at the command line, in a Unix environment
  • within oXygen


Lou Burnard. Date:
Copyright University of Oxford