This final lecture of the course covered worked solutions to the two past exam question set as homework last Friday.

Continue reading Lecture 21: Exam Preparation

# Category Archives: Lecture Log

# Lecture 20: Course Review

This afternoon’s lecture covered arrangements for the exam, advice on revision and preparation, as well as a summary of the course topics. Continue reading Lecture 20: Course Review

# Lecture 19: chi² Testing on Categorical Data

Today’s lecture covered more on hypothesis testing, presenting the χ^{2} test and working through three examples: student Inf1-DA exam results in 2011, bigram frequency in the British National Corpus, and possible gender bias in student admissions to Berkeley in 1973.

Continue reading Lecture 19: chi² Testing on Categorical Data

# Lecture 18: Hypothesis Testing and Correlation

Where the last lecture was about summary statistics for a single set of data, we now address *multi-dimensional* data with several linked sets of values among which we might look for *correlations*. This leads into several more sophisticated questions which are key to the effective application of statistics: how do we identify potential effects like correlation; how do we know when we have found evidence for an effect; and what might this tell us about any *causal* connections?

Continue reading Lecture 18: Hypothesis Testing and Correlation

# Lecture 17: Data Scales and Summary Statistics

This morning’s lecture gave a general overview of *statistics* and their role in analysing quantities of data. Most of the technical constructions — mean, median, mode, standard deviation — are probably familiar to many, but the setting for their application and the computational context may not be.

Continue reading Lecture 17: Data Scales and Summary Statistics

# Lecture 16: Vector Spaces for Information Retrieval

Today’s lecture presented various techniques to support effective information retrieval: the *big bag of words* model; *term-frequency inverse document frequency* (tf-idf); the *vector space model*; and *cosine similarity* for document ranking.

Continue reading Lecture 16: Vector Spaces for Information Retrieval

# Lecture 15: Information Retrieval

Following the rectangular tables of relational databases and the triangular trees of semistructured data, the remaining Inf1-DA lectures will address the representation and analysis of more *unstructured* data. Today’s lecture provided a brief introduction to the classic *information retrieval task* of searching a large collection of documents to find those that match a simple query.

Continue reading Lecture 15: Information Retrieval

# Lecture 14: Example Corpora Applications

Corpora are widely used for computational research into language, and for engineering natural-language computer systems. In linguistics, they make it possible to do real experimental science: to formulate hypotheses about the structure of languages, or changes in language between different places, times or people; and then test these on data. In building applications that handle text or speech, corpora provide the mass quantities of raw material used for machine learning and other algorithms.

Continue reading Lecture 14: Example Corpora Applications

# Lecture 13: Annotation of Corpora

Today’s lecture described some of the annotations added to text corpora, how they are generated, and some simple analyses; as well as indicating how these relate to applications such as empirical linguistics and the engineering of systems which work with natural languages.

Continue reading Lecture 13: Annotation of Corpora

# Lecture 12: Corpora

In literature a *corpus* (plural *corpora*) is a collection of written texts, in particular the complete works of a single author or a body of writing on a single subject. In *computational linguistics* and in *theoretical linguistics* a corpus is a body of written or spoken text used for study of a particular language or language variety. These corpora may be very large (billions of words) and provide the raw material for experimental investigation of real-world language use: the science of *empirical linguistics*.

Continue reading Lecture 12: Corpora