# Tutorial Notes: Statistical Analysis

Notes on solutions for this week’s tutorial are now online. Tutors are marking coursework submissions, and will return them with feedback in next week’s tutorial.

Thanks to everyone who contributed to the revision topics poll. There was strongest interest in material from the earlier parts of the course, so I’ll pick some past questions in that area to go over in lectures.

# Lecture 19: chi² Testing on Categorical Data

Today’s lecture covered more on hypothesis testing, presenting the χ2 test and working through three examples: student Inf1-DA exam results in 2011, bigram frequency in the British National Corpus, and possible gender bias in student admissions to Berkeley in 1973.
Continue reading Lecture 19: chi² Testing on Categorical Data

# Lecture 18: Hypothesis Testing and Correlation

Where the last lecture was about summary statistics for a single set of data, we now address multi-dimensional data with several linked sets of values among which we might look for correlations. This leads into several more sophisticated questions which are key to the effective application of statistics: how do we identify potential effects like correlation; how do we know when we have found evidence for an effect; and what might this tell us about any causal connections?
Continue reading Lecture 18: Hypothesis Testing and Correlation

# Tutorial Exercises: Update

The sheet of exercises for Tutorial 8 posted last night had some confusion about whether Question 2 was measuring sleep or exercise hours. This was my error. It’s now corrected, and there’s a revised version online in the usual place. My apologies: thanks to the student raising this on Piazza, and to Fabian for fixing this up.

# Tutorial Exercises: Statistical Analysis

Exercises for Tutorial 8 are now online, as well as notes on solutions for Tutorial 7.

This week you will apply statistical tests to the survey data gathered in the first lecture of the course, looking for possible connections between sleep, exercise, and choice of operating system. This uses techniques to be presented in this afternoon’s lecture and, awkwardly, next Tuesday’s. To support this, I’ve prepared and posted both sets of lecture slides in advance.

Thanks to everyone who submitted their coursework assignment yesterday. These are now going out to individual tutors, who will mark them and give you feedback on your work in the Week 11 tutorial.

# Lecture 17: Data Scales and Summary Statistics

This morning’s lecture gave a general overview of statistics and their role in analysing quantities of data. Most of the technical constructions — mean, median, mode, standard deviation — are probably familiar to many, but the setting for their application and the computational context may not be.
Continue reading Lecture 17: Data Scales and Summary Statistics

# Lecture 16: Vector Spaces for Information Retrieval

Today’s lecture presented various techniques to support effective information retrieval: the big bag of words model; term-frequency inverse document frequency (tf-idf); the vector space model; and cosine similarity for document ranking.
Continue reading Lecture 16: Vector Spaces for Information Retrieval

# Tutorial Exercises: Information Retrieval

Exercises for Tutorial 7 are now online, together with notes on solutions for Tutorial 6.

Question 1 you can do immediately; Question 2 requires material covered in Friday’s lecture.

The exercises for this tutorial are shorter than those in previous weeks. That’s because you will also be working on the coursework assignment. This tutorial is an opportunity for you to talk about that and ask your tutor questions. Please plan for this: come to the tutorial prepared to discuss your progress on the assignment.