Title slide
Slides : Recording

This lecture followed on from Friday’s in looking at the use of hypothesis testing to detect correlations in data. The first section examined the χ² test for working with qualitative data, using two demonstration examples: possible correlation between coursework submission and exam grades; and the discovery of collocations in large text corpora.

The second part of the lecture looked in more detail at some of the risks in misapplying statistical tests. Hypothesis testing can be a tremendously sensitive and powerful tool for discovering new science and identifying the connections between events. However, when used poorly it becomes misleading and unhelpful. The lecture covered a range of concerns about these risks: confusing correlation with causation; what p-values can tell us and what they can’t; when statistical “significance” is really about being statistically detectable; p-hacking, data dredging, outcome switching; and the current replication crisis in some experimental sciences. There is also hope and success, though: in the discovery of robust results through meta-analysis; the active discussions around reproducibility and predictive power in scientific research; and the many projects to record trials, replicate results, and improve publication of both negative and positive outcomes.

Links: Slides for Lecture 19; Recording of Lecture 19

Do This

Find a spurious correlation of your own.

Also Do This

Start your exam preparation. If you haven’t read the article from last time, do that now. Then write a summary list of the topics covered through this course: section by section for structured, semistructured, and unstructured data. At Friday’s lecture I’ll give a review of all the course topics. Bring your list along so that you can compare it with mine.


Too long? Skip to the comics at the bottom.

Correlation Does Not Imply Causation

Chart of time-series correlation

The Bad News About Significance Testing
Working to Make Things Better
  • The Cochrane Collaboration. Global project systematically reviewing evidence to improve health decisions.
  • Ben Goldacre. Bad Pharma: How Medicine is Broken, and How We Can Fix It. Fourth Estate, 2013.
  • AllTrials. Campaign for all clinical trials, past and present, to be registered and their results reported.
  • The COMPare Project. Inspecting published clinical trials to see whether they omit results or switch outcomes.
  • FDAAA Trials Tracker. Certain clinical trials in the USA are now required by law to publish online whatever results they find. The first of these were due in last month, February 2018 — this website tracks the results and effectiveness of that law.

XKCD: Correlation

XKCD: Significance

Lecture 19: The χ2 Test; Correlation and Causation