Slides : RecordingOnce we have some semistructured data gathered into an XML tree, we might want to find information within it. For small XML documents we can just look at it, or use text search; for large and very large documents there are dedicated query languages. Today’s lecture presented one of them: XPath, the XML Path Language. As well as being a query language in its own right, XPath is also a key component of many other XML and web technologies, where it is used to navigate around documents.
Continue reading Lecture 11: Navigating XML using XPath
Category Archives: Lecture Log
Lecture 10: Structuring XML

Every well-formed XML document is neatly arranged as a tree, with names for element nodes and all their attributes. This is enough for basic tools to correctly transmit and process XML; but for many applications it is useful to add more precise domain-specific constraints that we expect documents to satisfy. For this we have XML schema languages: specialised languages for describing types of XML document. This lecture covered one in particular, the Document Type Definition language DTD.
Continue reading Lecture 10: Structuring XML
Lecture 9: Trees and XML
Slides : RecordingFrom the strict rectangles of structured data to the more generous triangles of semistructured data. This morning’s lecture gave an overview of what kind of data is seen as “semistructured”; the idea of trees as a mathematical model of data; the particular form of trees in the XPath data model; and their textual representation in XML — the Extensible Markup Language.
XML also has a large number of domain-specific variants. These are all valid XML, and use standardised sets of element types to give a custom language for representing data relevant to a particular field: from musical scores to financial trading.
Continue reading Lecture 9: Trees and XML
Lecture 8: SQL Queries
Slides : RecordingToday was the final lecture on Structure Data and covered a range of database topics: ACID properties for transactions; the NoSQL movement; nested SQL queries, set operations, and aggregate queries; ultimate physical limits to computation; the wonders of the heavens captured in SkyServer; and the idea of doing scientific research and experiments from inside the database.
Continue reading Lecture 8: SQL Queries
Lecture 7: SQL

Today’s lecture introduced the basic structure and format of SQL queries: SELECT … FROM … WHERE …
. That’s enough to write a huge range of queries, from single summary statistics to large integrated views that bring together multiple tables.
Continue reading Lecture 7: SQL
Lecture 6: Tuple Relational Calculus
Today, another language for talking about databases. This one is the Tuple Relational Calculus for writing queries that describe information to be extracted from the linked tables of a relational database. There’s a separation of roles here: the tuple relational calculus is good for succinctly stating what we want to find out; while relational algebra from the last lecture describes how to combine and sift tables to extract that information from the data. We distinguish what information we want from how to compute it.
Continue reading Lecture 6: Tuple Relational Calculus
Lecture 5: Relational Algebra
This morning’s lecture presented a mathematical language for slicing and dicing the structured tables of the relational model: selection, projection, renaming; union, intersection, difference; cross product, join, equijoin and natural join. A key feature of this relational algebra is that just six of these operations are enough to capture an extremely wide range of queries and transformations of data. Database implementors work hard to build highly efficient engines to carry out these operations, which can then support many different kinds of user application.
Continue reading Lecture 5: Relational Algebra
Lecture 4: From ER Diagrams to Relational Models
Today’s lecture reviewed the high-level conceptual language of ER diagrams and the more concrete structures of the relational model; followed by some recipes for translating from the first into the second. This isn’t always an exact match, and for any particular ER diagram we might go back to its original scenario description to decide how to best represent it as a relational model. Even so, this kind of step-by-step staging towards a fully formal representation is an effective route to capturing the subtleties of real-world systems.
Continue reading Lecture 4: From ER Diagrams to Relational Models
Lecture 3: The Relational Model
Today’s lecture expanded on last week’s material on Entity-Relationship modelling, and then set out the basic elements of the Relational Model for structured data. While ER diagrams provide a conceptual language for describing things as they are, and have applications outside databases for general organisation and management, the relational model is explicitly intended as a mathematically precise scheme for the computer-assisted creation and querying of large datasets.
Continue reading Lecture 3: The Relational Model
Lecture 2: Entities and Relationships
Today’s lecture opened the topic of structured data with an introduction to Entity-Relationship Modelling and the graphical language of ER diagrams. This language is important for planning and structuring databases, bridging the gap between informal conceptual design and the logical precision required for machine implementation.
Continue reading Lecture 2: Entities and Relationships