From the strict rectangles of structured data to the more generous triangles of semistructured data. This morning’s lecture gave an overview of what kind of data is seen as “semistructured”; the idea of trees as a mathematical model of data; the particular form of trees in the XPath data model; and their textual representation in XML — the Extensible Markup Language.

XML also has a large number of domain-specific variants. These are all valid XML, and use standardised sets of element types to give a custom language for representing data relevant to a particular field: from musical scores to financial trading.

##### Homework
• XML Essentials from the World Wide Web Consortium (W3C).

• Sections 2.1–2.5 from Chapter 2 of Møller and Schwartzbach. I have sent a scanned PDF of this chapter to all students by email; there will also be printed copies outside the ITO in Appleton Tower; and you can find the whole book in the Library HUB.

###### 2. Do This
• Find an SVG file and open it in a text editor to study its XML content.

• Find a Microsoft Office .docx file and look at the XML content in that. This format (OOXML) is in fact a zipped archive of XML files, so you will need to unzip it first. Depending on your platform, this may require renaming the .docx extension as .zip