Today’s lecture presented the idea of correlation in data sets: observing correlations through scatter plots; measuring them with the correlation coefficient; and using hypothesis testing to see whether that gives evidence to distinguish them from chance coincidence. In this way we get increasingly more precise and sensitive measures for detecting correlation.

Although, remember: correlation does not imply causation. More on that next time.

Notes and solutions for the Information Retrieval tutorial are now online.

I have rearranged the content of the remaining two tutorials. The strike action means there is no written assignment this year: a practice exam paper which would normally be reviewed in Week 11. Instead of this, the tutorial exercise for next week is two practice exam questions and in the tutorial itself you will work through assessing your solutions with the tutor using the original examiners’ marking guidelines.

The final tutorial exercises will be in the following week on the topic of Statistical Analysis. By that time I will have addressed the necessary syllabus content in lectures.

Link: Tutorial exercises

Slides : Recording

Corpora are widely used for computational research into language, and for engineering natural-language computer systems. In linguistics, they make it possible to do real experimental science: to formulate hypotheses about the structure of languages, or changes in language between different places, times or people; and then test these on data. In building applications that handle text or speech, corpora provide the mass quantities of raw material used for machine learning and other algorithms.

I have put more course material online.

  • Notes and solutions for Tutorial 4, with a LibreOffice Base file of SQL queries.

  • Exercises for Tutorial 6, with instructions for the CQP tool you will be using.

Link: Tutorial exercises and notes

Regrettably, there is no recording of Lecture 13 from 2017. Last year’s blog post has slides and references; for video you might do best to go back to the course from 2014/15.

As I write this, the University and Colleges strike is still set to run from Monday to Thursday this week.
Universities UK and UCU have agreed to further talks mediated by the conciliation service ACAS. These will begin tomorrow. However, UUK have not conceded that any of the disputed pension changes are open to review and the strike action continues.

Link: Union statement on further talks

I think it’s excellent that we are now seeing some talks, but I haven’t yet heard anything from UUK suggesting progress toward any actual change. Several university vice-chancellors, though, have made such commitments for their own universities. Here at Edinburgh the Principal, Prof. Mathieson, is meeting with staff on Monday after a 300+ professors sent him an open letter.

(The reason for this being limited to professors is because they have a formal role in the governing of the University as part of the Senate.)

Link: Letter to University Principal

Slides : Recording

Today’s lecture was cancelled owing to the heavy snowfall. In its place I recommend you watch the recording from last year, linked on the right. I did trial a YouTube Live stream yesterday, but in practice it didn’t offer much over what this earlier screencast does. I have changed the reading and references slightly from last time: the handout “What is a corpus and what is in it?” is no longer required. In the recording you will also see some TopHat questions used last year. I don’t think those turned out too well in the lectures, but I will look into recovering them for you to try outside lectures.


Start|ED: Working for a Start-Up

This is the event that Riccardo Fiorista advertised at the start of Friday’s lecture. He’s a second-year Informatics student who has set this up with other students from Edinburgh and Heriot-Watt.

What Start|ED Student-run event to meet start-ups and find out about working for them
When 1300–1800 Wednesday 28 February 2018
Where Dovecot Studios, 10 Infirmary Street, Edinburgh
How Book your ticket online

Links: Start|ED event site; Map; Booking information

Slides : Recording

A well-formed XML document is one that is properly arranged as a tree, with names for element nodes and all their attributes. This is enough for basic tools to correctly transmit and process XML; but for many applications it is useful to add more precise domain-specific constraints that we expect documents to satisfy. For this we have XML schema languages: specialised languages for describing types of XML document. This lecture covered one in particular, the Document Type Definition language DTD.

Notes for Tutorial 3 are now online, as well as the next set of exercises. These include practice working directly with SQL in the LibreOffice desktop database tool. There are extension exercises on working with GUI query generation and also command-line interaction with a local PostgreSQL database server.

Link: Tutorial Exercises