The written assignment for Inf1-DA is now available. It’s a set of exam questions from previous years, and you have until Thursday 23 March to complete it.
Link: Coursework assignment
The tutorials web page now has notes and solutions for Tutorial 5 as well as exercises for Tutorial 6
Continue reading Tutorials: Notes and Solutions, New Exercises
Today’s lecture described some of the annotations added to text corpora, how they are generated, and some simple analyses; as well as indicating how these relate to applications such as empirical linguistics and the engineering of systems which work with natural languages.
Continue reading Lecture 13: Annotation of Corpora
In literature a corpus (plural corpora) is a collection of written texts, in particular the complete works of a single author or a body of writing on a single subject. In computational linguistics and in theoretical linguistics a corpus is a body of written or spoken text used for study of a particular language or language variety. These corpora may be very large (billions of words) and provide the raw material for experimental investigation of real-world language use: the science of empirical linguistics.
Continue reading Lecture 12: Corpora
Slides : RecordingOnce we have some semistructured data gathered into an XML tree, we might want to find information within it. For small XML documents we can just look at it, or use text search; for large and very large documents there are dedicated query languages. Today’s lecture presented one of them: XPath, the XML Path Language. As well as being a query language in its own right, XPath is also a key component of many other XML and web technologies, where it is used to navigate around documents.
Continue reading Lecture 11: Navigating XML using XPath
LEGO have today announced the latest model to come out of their Ideas fan-design programme. It’s the “Women of NASA” set which includes Margaret Hamilton: computer scientist, software engineer, and leader of the programming team for the Apollo and Skylab space missions.
Director of Apollo Flight Computer Programming for the moon landings and other NASA missions. CEO, Hamilton Technologies.
The tutorials web page now has the latest set of tutorial exercises. For these you get to work directly with XML and XPath, using the
xmllint command-line tool.
Continue reading Tutorial Exercises: XML and XPath
Exercises for Tutorial 4 are on the tutorial page. These involve writing and executing SQL queries using the LibreOffice Base desktop application. This is installed on all DICE machines, and the tutorial includes detailed usage notes: using DICE is probably the simplest way to complete the tutorial. It is also possible to do the exercises on university open-access machines or your own computer, and there are some instructions to help with that.
This tutorial includes more substantial starred extension exercises to try out other ways to access the same database.
Every well-formed XML document is neatly arranged as a tree, with names for element nodes and all their attributes. This is enough for basic tools to correctly transmit and process XML; but for many applications it is useful to add more precise domain-specific constraints that we expect documents to satisfy. For this we have XML schema languages: specialised languages for describing types of XML document. This lecture covered one in particular, the Document Type Definition language DTD.
Continue reading Lecture 10: Structuring XML