Lecture 18: Hypothesis Testing and chi²

Title slide

Slides : Recording

Today’s lecture revisited the idea of correlation in data sets, and introduced the method of hypothesis testing for identifying whether features observed in samples in fact arise by chance.

For paired series of numerical data we can use the correlation coefficient, and for qualitative data the χ2 statistic. The lecture included examples of this applied to last year’s Inf1-DA exam results, bigram frequency in the British National Corpus, and possible gender bias in student admissions to Berkeley in 1973.

Link: Slides for Lecture 18; Recording; Music

What Next?
1. Do This

Find “statistically significant” results yourself in 60 years of data on the US economy.

2. Read This
Berkeley Admissions

This closely analyses the admissions data to conclude that it does point at serious issues of discrimination, although not quite in the places first indicated.

Links: Publisher’s page; Full text by University Library subscription

The bias in the aggregated data stems not from any pattern of discrimination on the part of admissions committees, which seem quite fair on the whole, but apparently from prior screening at earlier levels of the educational system. Women are shunted by their socialization and education toward fields of graduate study that are generally more crowded, less productive of completed degrees, and less well funded, and that frequently offer poorer professional employment prospects.

  • Wikipedia page on Simpson’s Paradox, of which the Berkeley admissions is a well-known example.