Corpora are widely used for computational research into language, and for engineering natural-language computer systems. In linguistics, they make it possible to do real experimental science: to formulate hypotheses about the structure of languages, or changes in language between different places, times or people; and then test these on data. In building applications that handle text or speech, corpora provide the mass quantities of raw material used for machine learning and other algorithms.
I have put more course material online.
Notes and solutions for Tutorial 4, with a LibreOffice Base file of SQL queries.
Exercises for Tutorial 6, with instructions for the CQP tool you will be using.
Link: Tutorial exercises and notes
Regrettably, there is no recording of Lecture 13 from 2017. Last year’s blog post has slides and references; for video you might do best to go back to the course from 2014/15.
As I write this, the University and Colleges strike is still set to run from Monday to Thursday this week.
Universities UK and UCU have agreed to further talks mediated by the conciliation service ACAS. These will begin tomorrow. However, UUK have not conceded that any of the disputed pension changes are open to review and the strike action continues.
Link: Union statement on further talks
I think it’s excellent that we are now seeing some talks, but I haven’t yet heard anything from UUK suggesting progress toward any actual change. Several university vice-chancellors, though, have made such commitments for their own universities. Here at Edinburgh the Principal, Prof. Mathieson, is meeting with staff on Monday after a 300+ professors sent him an open letter.
(The reason for this being limited to professors is because they have a formal role in the governing of the University as part of the Senate.)
Link: Letter to University Principal
Today’s lecture was cancelled owing to the heavy snowfall. In its place I recommend you watch the recording from last year, linked on the right. I did trial a YouTube Live stream yesterday, but in practice it didn’t offer much over what this earlier screencast does. I have changed the reading and references slightly from last time: the handout “What is a corpus and what is in it?” is no longer required. In the recording you will also see some TopHat questions used last year. I don’t think those turned out too well in the lectures, but I will look into recovering them for you to try outside lectures.
Staff in the University and College Union (UCU) will be on strike for Monday, Tuesday and Wednesday this week. This is part of a continuing dispute with Universities (UUK) over proposed changes to pension arrangements. Further strikes are planned over the following weeks: this schedule may change if the parties involved return to talks or take up arbitration.
Link: University guidance for students
This is the event that Riccardo Fiorista advertised at the start of Friday’s lecture. He’s a second-year Informatics student who has set this up with other students from Edinburgh and Heriot-Watt.
Links: Start|ED event site; Map; Booking information
A well-formed XML document is one that is properly arranged as a tree, with names for element nodes and all their attributes. This is enough for basic tools to correctly transmit and process XML; but for many applications it is useful to add more precise domain-specific constraints that we expect documents to satisfy. For this we have XML schema languages: specialised languages for describing types of XML document. This lecture covered one in particular, the Document Type Definition language DTD.
Notes for Tutorial 3 are now online, as well as the next set of exercises. These include practice working directly with SQL in the LibreOffice desktop database tool. There are extension exercises on working with GUI query generation and also command-line interaction with a local PostgreSQL database server.
Link: Tutorial Exercises
The story of the six women who were the original programmers for ENIAC: the all-electronic programmable computer developed in secret during World War II by the US Army. This documentary includes interviews and discussion of their work and impact on computer science.
This evening CompSoc Edinburgh and Bloomberg present a showing of The Computers, followed by a panel discussion on diversity and inclusion.
||Tuesday 13 February 2018
||Appleton Tower Lecture Theatre 2
Links: The ENIAC Programmers Project; CompSoc Edinburgh; Register to attend; Facebook; Google Calendar
From the strict rectangles of structured data to the more generous triangles of semistructured data. This morning’s lecture gave an overview of what kind of data is seen as “semistructured”; the idea of trees as a mathematical model of data; the particular form of trees in the XPath data model; and their textual representation in XML — the Extensible Markup Language.
XML also has a large number of domain-specific variants. These are all valid XML, and use standardised sets of element types to give a custom language for representing data relevant to a particular field: from musical scores to financial trading.