From the strict rectangles of structured data to the more generous triangles of semistructured data. This morning’s lecture gave an overview of what kind of data is seen as “semistructured”; the idea of trees as a mathematical model of data; the particular form of trees in the XPath data model; and their textual representation in XML — the Extensible Markup Language.

XML also has a large number of domain-specific variants. These are all valid XML, and use standardised sets of element types to give a custom language for representing data relevant to a particular field: from musical scores to financial trading.

1. Read This
2. Do This
  • Find an SVG file and open it in a text editor to study its XML content.

  • Find a Microsoft Office .docx file and look at the XML content in that. This format (OOXML) is in fact a zipped archive of XML files, so you will need to unzip it first. Depending on your platform, this may require renaming the .docx extension as .zip

    Link: Wikipedia on Microsoft’s OOXML format


To learn more about XML, try any of the following.

