Lecture 12: Corpora

Title slideIn literature a corpus (plural corpora) is a collection of written texts, in particular the complete works of a single author or a body of writing on a single subject. In computational linguistics and in theoretical linguistics a corpus is a body of written or spoken text used for study of a particular language or language variety. These corpora may be very large (billions of words) and provide the raw material for experimental investigation of real-world language use: the science of empirical linguistics.

Note: There was a photocopied handout in today’s lecture, and all copies were taken. I’ve placed a large number of additional copies in the ITO; see homework below for details.
Continue reading

Lecture 11: Navigating XML using XPath

Title slideOnce we have some semistructured data gathered into an XML tree, we might want to find information within it. For small XML documents we could just look at it, or use text search; for large and very large documents there are dedicated query languages. Tuesday’s lecture presented one of them: XPath, the XML Path Language. As well as being a query language in its own right, XPath is also a key component of many other XML and web technologies, where it is used to navigate around documents.
Continue reading

Lecture 10: Structuring XML

Title slideEvery well-formed XML document is neatly arranged as a tree, with names for element nodes and all their attributes. This is enough for basic tools to correctly transmit and process XML; but for many applications it is useful to add more precise domain-specific constraints that we expect documents to satisfy. For this we have XML schema languages: specialised languages for describing types of XML document. This lecture covered one in particular, the Document Type Definition language DTD.
Continue reading

Lecture 9: XML

Title slideFrom the strict rectangles of structured data to the more generous triangles of semistructured data. This lecture gave an overview of what might qualify data as semistructured; trees in general as a mathematical model of data; the particular form of trees in the XPath data model; and their textual respresentation in XML — the Extensible Markup Language.

Finally, some examples of real XML data: from musical scores to financial trading.
Continue reading

Lecture 8: SQL Queries

Title slideToday’s lecture was the final one on Structure Data and covered a range of database topics: ACID properties for transactions; the NoSQL movement; nested SQL queries, set operations, and aggregate queries; ultimate physical limits to computation; the wonders of nature captured in SkyServer; and the idea of doing scientific research and experiments from inside the database.
Continue reading

Lovelace Colloquium

Ada Lovelace by Sydney PaduaBCSWomen
Lovelace Colloquium 2015
Informatics Forum
Thursday 9 April 2015

One-day conference for women students of Computing and related subjects

We’re proud to be hosting the colloquium at Edinburgh this year.

The aims of the event are:

  • To provide a forum for women undergraduate and masters students to share their ideas and network;
  • To provide a stimulating series of talks from women in computing, both from academia and industry;
  • To provide both formal (talks) and informal (networking) advice to undergraduate women about careers in computing from a female perspective.

There are poster competitions for women students at all levels of study. Everyone with a poster accepted for the meeting is eligible for travel funding to attend, and there are additional cash prizes from industry sponsors.

Link: Poster contest — Enter your 250-word abstract by 28 February

The organisers are also looking for students to help coordinate social events. If you don’t want to make a poster but are interested in getting involved then please contact Amy.Guy@ed.ac.uk

See also the Edinburgh University Hoppers for women in Informatics.

Links: Lovelace Colloquium; Edinburgh University Hoppers

Inf1-DA 2014–2015