1. Functionality
The TD uses a special purpose software platform, developed at the Max Planck
Digital Library, which provides humanities researchers with the following
functionality:
- they can upload an existing TEI
resource and associated materials to an online
Repository;
- they can integrate the metadata
provided with their resource into the
Repository database to facilitate
cross-searching;
- they can extract subsets ("collections")
of documents (or document parts) from the
Repository on the basis of intelligent
searches across both metadata and content
within the Repository;
- they can download the results of
searches across the Repository as new TEI
documents for further analysis by other TEI
tools;
- they can upload a description of an
existing TEI resource maintained elsewhere
together with connectors which enable it to
become a federated part of the Repository;
These facilities will be provided by a customised
application of the eSciDoc digital
library platform, developed by the Max Planck Digital
Library team in Munich.
As well as providing an easy to use publication platform for TEI documents, the
TEI Demonstrator will showcase the variety of TEI practice. For that reason, the
TD will demonstrate clearly
- how resources are constructed -- the
XML markup should be visible as such, and its
semantics should be documented (if varying
from TEI)
- what the XML markup does -- for example, it should facilitate
alternative smart visualisations of the resource, it should facilitate
intelligent searching of the resource
A "TEI Resource" here means
- one or more XML documents marked up in TEI P5 XML and conforming to
the TEI-all schema
- optionally, a more restricted associated RelaxNG, W3C Schema, or DTD
against which those documents are valid
- a TEI P5-conformant ODD documentation file expressing the relation
between that schema and the TEI framework
- associated user-level documentation, tutorials etc. about the resource
- associated stylesheets or other mappings/connectors
Not all of these components will necessarily be available in all cases.
2. Implementation
As noted above, the initial implementation of the
TEI Demonstrator will use the eSciDoc platform. The
first, pilot, phase of the project will provide access
to a small set of documents selected in order to test
the abilities of the platform to cope with the full
variety of TEI texts. The texts initially available
will be selected from those listed below. Some
modifications may be made in the texts to ensure that
they conform to the TEI-all schema, but in other
respects the coding choices of the original encoders
will be retained. We anticipate therefore that the
Demonstrator will present a number of different
perspectives on how specific encoding problems may be
addressed.
At a later stage, registered users will be able to
upload additional texts directly. It is planned to
launch the project to internal DARIAH partners and
collaborators by the end of May 2010, and to
demonstrate it in public at the DARIAH conference
planned for October 2010.
Texts for inclusion in the Demonstrator must be
made available under the terms of a Creative Commons
"CCBY" licence, permitting sharing and remixing, and
requiring attribution.
Following an initial call for sample texts circulated to the TEI list in November
2009 (see
initial texts offered), a small committee from the DARIAH WP7 group
identified the following texts for inclusion in the pilot:
- L'Est
Republicain (a corpus of contemporary
French newspaper text, supplied by the Centre Nationale
des Ressources Textuelles et
Linguistique
- The Tale of Samuel Whiskers (a 19th c.
English children's book, originally digitized by Project Gutenberg, and
converted into minimal TEI xml)
- Champfleury
(16th century french text, from the Bibliothèques
Virtuelles Humanistes)
- S Poti (two versions of an early 20th c.
Slovenian literary text, from the eZISS)
- The Queen's
Christmas Broadcasts 1952-2009 (a
collection of public speeches in English,
downloaded from the Royal website, converted
into TEI, and enriched with an automatic
morpho-syntactic analysis)
- Die deutsche
Turnkunst (an early 19th century German
polemic text, from the Deutsches
Textarchiv in Berlin)
- P. Zen. Pestm
(Ancient Greek and Demotic fragmentary
texts from the Zenon Archive, from the EPIDOC
community)
- Drawings of
Runestones (Multilingual transcription
of manuscript describing and depicting a
collection of runic monumental inscriptions,
from the NFI
- La Queste del
Saint Graal (digital edition of 13th c
French manuscript with linguistic analysis,
from the Base de
Français Médiéval project).
- Actions and
Reactions (a volume of 19th c. English
short stories with minimal tagging)
Click on the title of each text listed above to
download a zip archive containing the parts of the
resource concerned which will be uploaded to the TEI
Demonstrator. Note that the majority of these texts
are already available via other websites; the
Demonstrator will not attempt to replicate the
functionality of these.