Once we have some semistructured data gathered into an XML tree, we might want to find information within it. For small XML documents we could just look at it, or use text search; for large and very large documents there are dedicated query languages. Tuesday’s lecture presented one of them: XPath, the XML Path Language. As well as being a query language in its own right, XPath is also a key component of many other XML and web technologies, where it is used to navigate around documents.
I also presented information about ESES, the Edinburgh Student Experience Survey; and the UTF-8 encoding of Unicode characters, as used in XML and in time everywhere else.
Link: Slides for Lecture 11
Complete the ESES survey: you can do this by logging in to MyEd and selecting the “Student Surveys” entry on the left-hand side.
- Wikipedia on XPath.
- The 10-minute XPath tutorial. I think ten minutes is rather optimistic, but I do recommend the tutorial.
- The full XPath specification XML Path Language Version 1.0. This is quite challenging, but I think worth browsing to see what the full formal standard looks like.
- Other XML technologies from the World Wide Web Consortium (W3C).
If you liked UTF-8, then I strongly recommend finding out about the extraordinary construction that is Punycode. There’s a formal description in RFC 3492; a useful explanation in the Wikipedia article on Punycode; you can try it yourself with this Punycode convertor; and confirm that it works for real by visiting the Chinese Internet Network Information Center at http://中国互联网络信息中心.中国.