5 Load, modify, validate a complete ready-made document

In this section we are going to walk you through what is probably you're first session editing an XML document. We will assume you have installed oXygen XML editor. As explained in the previous section, you can edit TEI documents in many different XML editors. oXygen however is currently a popular choice in the TEI community. Whatever editor you use, it should be possible to do many of the things we will mention here. We encourage you to take the time to get familiar with the editor. It will increase your confidence while working with XML. It will also save you lots of time, if you know some of the shortcuts that your editor provides for actions that often recur.

If you feel you are already fairly comfortable with XML editing, you can skim through this section and go to the next.

  1. To get started, go to http://tei.oucs.ox.ac.uk/GettingStarted/html/lm.html and download diaryfragment.zip. Extract the files into some folder that you have access to. You should see there two files: diaryfragment.xml and graves1938-10-10.jpg. The XML file integrates the examples that we used in section [ID os in TEI Guidelines].
  2. Open the XML file in oXygen. If (on Windows) you associated the XML file type with oXygen, you can do that by double-clicking the XML file. Otherwise, start oXygen and open the file from the program.

What you see now may look like this:

Figure 6. Sample file loaded in Oxygen
Sample file loaded in Oxygen

We will now walk through a number of this editor's features. Many of those you will also find in other editors.

5.1 Multiple views and modes

Next to the main document view, you notice a number of smaller windows. They give information either about the text itself or the things that are allowed at that current location in the text. They can be opened (in oXygen) using the Project Menu, option Views. In the image above, we notice among others an outline view. The outline view shows an indented view of the hierarchical structure of the XML document, including elements, identifying attributes and text.

Figure 7. Outline view
Outline view

Another view is the attributes view. The attributes view shows the attributes available on the current element and their values. It can also be used to edit the attributes' values.

Figure 8. Attribute view
Attribute view

The elements view shows the elements that the current element can contain. Double-clicking an element inserts it into the document at the cursor location.

Figure 9. Elements view
Elements view

While these views can offer valuable information, of course they also use space that you could use for the edited file.

The edited text as it is displayed in the images we saw up to now includes the text and the XML tags. It is oXygen's standard display, and it is called the text mode. There are also two other editing modes, called grid mode (useful for editing spreadsheet-like XML document, but not for TEI documents) and author mode. In author mode, the XML tags are hidden. The buttons that switch between the modes are located in he left bottom corner of the text view

Figure 10. Mode switch buttons
Mode switch buttons

Author mode, as we see below, formats the text in an author-friendly view, that helps the author focus on the document text rather than on XML tagging. We notice that some elements have received an appropriate display: <del> elements are shown in red and crossed out, <add> elements are shown in green.

Figure 11. Document in author mode
Document in author mode

The user can choose the amount of XML tagging that is made visible in author mode. Buttons on the toolbar can be used to switch between several options. More information, including a video demonstration, is available on the oXygen site.

Figure 12. Choosing the amount of tag information to be displayed in author mode
Choosing the amount of tag information to be displayed in author mode

5.2 The schema

In the previous section we envountered a number of extra views next to the edited document, including overviews of the attributes and elements that are available in a specific location. The information in these views does not come from the document, but from the schema that describes the document. How does the editor know which schema to apply? After all, the XML file does not refer to this schema.

The answer is that the editor uses certain rules to guess what type of file it is dealing with. In this case, it decides this is a TEI document, based on the TEI namespace http://www.tei-c.org/ns/1.0.

Figure 13. Choosing the TEI namespace
Choosing the TEI namespace

We could have explicitly chosen to use a certain schema. Unfortunately, for the type of schema that we use in this document, Relax NG schema's, there is no standard way to set the connection between document and schema. oXygen uses an XML processing instruction. The next figure shows how to create such an instruction, and the resulting but of XML code.

Figure 14. Associating a document with a schema
Associating a document with a schema

The schema will be used for three purposes: to present allowed elements, attributes, and values to the user when he/she is editing the file, to validate the contents of the file, and to provide information about the definition of the element or attribute that the mouse hovers over. This last application of the schema you can verify immediately by placing the mouse over any eleent or attribute. The next figure shows an example.

Figure 15. Placing the mouse over the n attribute shows the attribute's definition, taken from the schema.
Placing the mouse over the n attribute shows the attribute's definition, taken from the schema.

5.3 Modifying the document

Let us modify the document, e.g. by adding another paragraph after the last one in the diary entry. Place the mouse after the closing </p>, hit Enter and type <. Based on the schema, the editor knows what elements are allowed in this context, and offers a list of these elements, that you can use the keyboard or the mouse to select an element from. Alongside the selected elemet the system shows its definition.

Figure 16. An element selection list appears after the user types the opening bracket ("<").
An element selection list appears after the user types the opening bracket ("<").

We select the <p> (paragraph) element. Note that the system adds the closing tag for us. Let us type some text and highlight a word. That gives us an opportunity to show another way of adding an element: selecting a piece of text and choosing the "Surround with tags" option, either from the context menu (option Refactoring) or using the Ctrl-E keyboard shortcut.

Figure 17. Surrounding a selection with an element.
Surrounding a selection with an element.

The system can also prompt the user with the allowed attributes. Once we have created the hi element to highlight a word, we can add the rend attribute to say how it should be highlighted. Once we type the first space character in the opening tag <hi> the system assumes we want to add an attribute and shows them as follows:

Figure 18. An attribute selection list appears after typing a space in an opening tag.
An attribute selection list appears after typing a space in an opening tag.

If the schema would have defined a fixed list of allowed values for the rend attribute, we could select one of these values in the same way.

5.4 Validation

There are several levels of checking XML documents for correctness. At a syntactical level, XML documents are checked for well-formedness. This is a check for the properties described in 2.1.2 XML Syntax in a Nutshell. At a semantic level, XML documents can be checked for validity with respect to a schema. The process of 'validation' is usually understood to do both types of checks.

Just like modern word processors do spell checking on the fly, modern XML editors do validation 'as you type'. The syntax checker finds a problem, e.g., immediately when we remove the closing tag (</p>) of the paragraph we added in our sample document.

Figure 19. Syntax error
Syntax error

Notice the removed </p>, the red curly line indicating that something is wrong (the <div> closes before the <p> inside it is closed), the indicator that there is an error in the right margin, and the error message below. Now let us make a semantic error and create a <ptest> element after our paragraph. The <ptest> element does not exist, so we will have to type that manually. Again, the system will tell us we have made a mistake: using the red curly line, the indicator in the margin and the error message.

Figure 20. Validation error
Validation error

The user can also explicitly request validation. This may be useful e.g. when nothing has changed in the document, but the schema has been modified. The user can then ask to refresh the schema cache before validation is done.

Figure 21. Activate validation manually
Activate validation manually

After validation, the right margin will show the location of errors in the document. Clicking the red bar will bring up the corresponding error. The red square at the rop indicates the document contains errors.

Figure 22. Location of errors after validation
Location of errors after validation

5.5 And more...

Of course the editor contains a lot more functionality that can help make life easier for the XML encoder. We cannot go into more detail here, but again we urge the reader to explore and take advantage of what is there.

Date: 2013-03-21