![]() |
Slides : Recording |
This lecture followed on from Friday’s in looking at the use of hypothesis testing to detect correlations in data. The first section examined the χ² test for working with qualitative data, using two demonstration examples: possible correlation between coursework submission and exam grades; and the discovery of collocations in large text corpora.
The second part of the lecture looked in more detail at some of the risks in misapplying statistical tests. Hypothesis testing can be a tremendously sensitive and powerful tool for discovering new science and identifying the connections between events. However, when used poorly it becomes misleading and unhelpful. The lecture covered a range of concerns about these risks: confusing correlation with causation; what p-values can tell us and what they can’t; when statistical “significance” is really about being statistically detectable; p-hacking, data dredging, outcome switching; and the current replication crisis in some experimental sciences. There is also hope and success, though: in the discovery of robust results through meta-analysis; the active discussions around reproducibility and predictive power in scientific research; and the many projects to record trials, replicate results, and improve publication of both negative and positive outcomes.
Links: Slides for Lecture 19; Recording of Lecture 19
Homework
Do This
Find a spurious correlation of your own.
Also Do This
Start your exam preparation. If you haven’t read the article from last time, do that now. Then write a summary list of the topics covered through this course: section by section for structured, semistructured, and unstructured data. At Friday’s lecture I’ll give a review of all the course topics. Bring your list along so that you can compare it with mine.
References
Too long? Skip to the comics at the bottom.
- Correlation Does Not Imply Causation
- The Bad News About Significance Testing
- Reproducibility
- Working to Make Things Better
Correlation Does Not Imply Causation
- Tobacco Use and Academic Achievement. CDC: Centers for Disease Control.
- Health-Risk Behaviors and Academic Achievement. CDC with statistics relating academic achievement to just everything: physical exercise, alcohol, sex, watching television, and carrying a weapon. Who knew?
- Smoking Gun, Jean Marston, March 2008.
Letter to New Scientist recalling Fisher’s sceptical response to Doll and Bradford Hill’s work connecting smoking and lung cancer. - Spurious Correlations. Data dredging at its best.
The Bad News About Significance Testing
-
Elina Hemminki. Study of information submitted by drug companies to licensing authorities. British Medical Journal 1980:280(833–836).
Some of the first work identifying significant variations in which studies drug companies published and which they did not. This link is to a page in the James Lind Library, an initiative to record the development of fair tests in healthcare.
-
John P. A. Ioannidis. Why Most Published Research Findings Are False. PLoS Medicine 2005 2(8):e124.
DOI: 10.1371/journal.pmed.0020124. -
David Trafimowa and Michael Marks. Editorial.
Basic and Applied Social Psychology 37(1)1–2, 2015.
DOI: 10.1080/01973533.2015.1012991This is the announcement that BASP will not accept papers using p-values, significance testing, or confidence intervals.
-
Allen Downey. Statistical Inference is only Mostly Wrong.
Probably Overthinking It, March 2015.Blog article responding to BASP ban.
-
Ella Rhodes. Liberating, or Locking Away our Best Tools?
The Psychologist, 2015.Commentary on the BASP ban, reporting reactions from a range of parties.
-
Dalmeet Singh Chawla. ‘One-size-fits-all’ threshold for P values under fire. Nature News, September 2017.
Reporting on the proposal that p-values need to be ten times smaller (p<0.005) for results to be considered significant.
Reproducibility
- Many scientific studies can’t be replicated. That’s a problem. Washington Post, August 2015.
- Over half of psychology studies fail reproducibility test Nature News, August 2015.
- Open Science Collaboration. Estimating the Reproducibility of Psychological Science. Science, 349(6521), 2015.
DOI: 10.1126/science.aac4716 - Reproducibility: A tragedy of errors. Nature News, February 2016.
- Psychology’s reproducibility problem is exaggerated – say psychologists. Nature News, March 2016.
- Taking on Chemistry’s Reproducibility Problem. Chemistry World, March 2017.
- Latest Reproducibility Project Study Fails to Replicate. The Scientist, March 2018.
The most recent result from a project to independently repeat experimental results from high-profile scientific papers. This one was about the biology of colon cancer.
Working to Make Things Better
- The Cochrane Collaboration. Global project systematically reviewing evidence to improve health decisions.
- Ben Goldacre. Bad Pharma: How Medicine is Broken, and How We Can Fix It. Fourth Estate, 2013.
- AllTrials. Campaign for all clinical trials, past and present, to be registered and their results reported.
- The COMPare Project. Inspecting published clinical trials to see whether they omit results or switch outcomes.
- FDAAA Trials Tracker. Certain clinical trials in the USA are now required by law to publish online whatever results they find. The first of these were due in last month, February 2018 — this website tracks the results and effectiveness of that law.