Lecture 18: Hypothesis Testing and Correlation

Title slideWhere the last lecture was about summary statistics for a single set of data, we now address multi-dimensional data with several linked sets of values among which we might look for correlations. This leads into several more sophisticated questions which are key to the effective application of statistics: how do we identify potential effects like correlation; how do we know when we have found evidence for an effect; and what might this tell us about any causal connections?

The slides include formulae for computing the correlation coefficient for two datasets, for estimating the correlation coefficient of population data from a random sample, and how to identify when this is statistically significant.

I’ve also included a host of warnings about the dangers of “significance” testing, p-values, and the risks of confusing correlation with causation.

Links: Slides for Lecture 18; Video of Lecture 18


Too long? Skip to the comics at the bottom. They’re good.

  • Wikipedia on Anscombe’s Quartet.

  • F. J. Anscombe. Graphs in Statistical Analysis, The American Statistician, 27(1):17–21, February 1973.

    Short, readable article advocating the importance of graphing your data before making judgements on it. This is the source of the quartet.

The Bad News About Significance Testing
Correlation may not be Causation
Counting Tanks

XKCD: Correlation

XKCD: Significance