Lecture 14: Example Corpora Applications

Title slide

Slides : Recording

Corpora are widely used for computational research into language, and for engineering natural-language computer systems. In linguistics, they make it possible to do real experimental science: to formulate hypotheses about the structure of languages, or changes in language between different places, times or people; and then test these on data. In building applications that handle text or speech, corpora provide the mass quantities of raw material used for machine learning and other algorithms.
Continue reading Lecture 14: Example Corpora Applications

Lecture 12: Corpora

Title slide

Slides : Recording

In literature a corpus (plural corpora) is a collection of written texts, in particular the complete works of a single author or a body of writing on a single subject. In computational linguistics and in theoretical linguistics a corpus is a body of written or spoken text used for study of a particular language or language variety. These corpora may be very large (billions of words) and provide the raw material for experimental investigation of real-world language use: the science of empirical linguistics.
Continue reading Lecture 12: Corpora

Lecture 11: Navigating XML using XPath

Title slide

Slides : Recording

Once we have some semistructured data gathered into an XML tree, we might want to find information within it. For small XML documents we can just look at it, or use text search; for large and very large documents there are dedicated query languages. Today’s lecture presented one of them: XPath, the XML Path Language. As well as being a query language in its own right, XPath is also a key component of many other XML and web technologies, where it is used to navigate around documents.
Continue reading Lecture 11: Navigating XML using XPath

New LEGO Computer Scientist Minifig: Margaret Hamilton

Picture of LEGO set
LEGO Women of NASA
LEGO have today announced the latest model to come out of their Ideas fan-design programme. It’s the “Women of NASA” set which includes Margaret Hamilton: computer scientist, software engineer, and leader of the programming team for the Apollo and Skylab space missions.

This goes to complement LEGO’s existing computer programmer minifig. Sadly Lovelace and Babbage didn’t make it this time around.

Links: Announcement; Set design

Hamilton in Apollo Command Module Margaret Hamilton
Software Engineer

Director of Apollo Flight Computer Programming for the moon landings and other NASA missions. CEO, Hamilton Technologies.

Links: Work at NASA; Hamilton Technologies

Tutorial Exercises

Exercises for Tutorial 4 are on the tutorial page. These involve writing and executing SQL queries using the LibreOffice Base desktop application. This is installed on all DICE machines, and the tutorial includes detailed usage notes: using DICE is probably the simplest way to complete the tutorial. It is also possible to do the exercises on university open-access machines or your own computer, and there are some instructions to help with that.

This tutorial includes more substantial starred extension exercises to try out other ways to access the same database.

  1. Setting up queries using the LibreOffice Base graphical query designer.
  2. Connecting on the command-line to a remote PostgreSQL database server.
  3. Linking LibreOffice Base to the remote database server.