## Lecture 17: Data Scales and Summary Statistics

Baseline information about statistics; some kinds of data, some kinds of analysis. Also, gravitational waves and the big bang. Continue reading

## Lecture 16: Vector Spaces for Information Retrieval

The vector space model for information retrieval treats documents as vectors in a very high-dimensional space: a dimension for every distinct word, with the vector coordinate being the number of times the word occurs in the document. In a collection … Continue reading

## Tutorial 6: Solutions

I have posted notes on solutions to Tutorial 6 to the course web page and to NB. Following feedback from tutors and students, these also contain a few corrections and clarifications. Link: Tutorial exercises on the Inf1-DA web page

## Tutorial 7: Information Retrieval

Exercise sheet posted on course webpage and NB. Continue reading

## Lecture 15: Information Retrieval

Unstructured data; specifying the Information Retrieval problem and evaluating solutions. Continue reading

## Lecture 14: Example Corpora Applications

Corpora are widely used for computational research into language, and for engineering natural-language computer systems. In linguistics, they make it possible to do real experimental science: formulate hypotheses about the structure of languages, or changes in language between different places, … Continue reading

## Coursework Assignment

The Inf1-DA web page now has details for the written coursework assignment. It’s a copy of last year’s exam, although instead of two hours in an exam hall you have two weeks to complete and submit your solutions. Follow the … Continue reading

## Tutorial 6: Corpus Querying

Tutorial 6, on querying a corpus, is now available on the course web page. Continue reading

## Lecture 13: Annotation of Corpora

Annotation and analyses of corpora: part-of-speech tagging; syntactic structure; concordances, frequencies and n-grams. Continue reading

## Lecture 12: Corpora

Introduction to what a corpus is and how they are built; copies of the reading handout are in the ITO. Continue reading

