Every well-formed XML document is neatly arranged as a tree, with names for element nodes and all their attributes. This is enough for basic tools to correctly transmit and process XML; but for many applications it is useful to add more precise domain-specific constraints that we expect documents to satisfy. For this we have XML schema languages: specialised languages for describing types of XML document. This lecture covered one in particular, the Document Type Definition language DTD.
A DTD is a little like a type from a programming language: we can check that a value has a certain type, and a function may require arguments of a certain type; similarly we can validate an XML document against a schema, and some processing operation may require as input an XML document matching a certain schema. However, a single XML document may routinely match more than one schema — there is no concept of “the” schema for a document — and XML schema languages often appear more complex than familiar type systems.
This lecture set out the details and usage of XML DTDs, and also how the content of a relational database can be transmitted through XML (and why). There were also announcements about the EUSA Teaching Awards and Innovative Learning Week, with an extended diversion on Unicode and the history of character sets.
Link: Slides for Lecture 10
Homework
-
Find out about Postel’s Law: what it says, what that means for computer languages and protocols, and what people think about it.
-
Look inside the XML of SVG,
.docx
, and one of the other specialized XML formats. See Tuesday’s lecture for some ideas and instructions.
Miscellany
- Wikipedia pages on some character sets: Baudot, ASCII, EBCDIC, ISO-8859-xx and Unicode.
- Some notes on the early years of Unicode.
- On the official Unicode site you can also read about What is Unicode?, the Principles of Unicode, what new additions are coming up, and take a walk around the Unicode roadmaps.
- You can watch the Unicode slideshow, all 105 characters in 3 hours.
- Here’s the Observing Eye proposal, accepted on 3 February 2015 for a future vote on inclusion in Unicode.
- Finally, should you decide to propose a new character or script yourself, you’ll need to follow these instructions.