Where the last lecture was about summary statistics for a single set of data, we now address multi-dimensional data with several linked sets of values among which we might look for correlations. This leads into several more sophisticated questions which are key to the effective application of statistics: how do we identify potential effects like correlation; how do we know when we have found evidence for an effect; and what might this tell us about any causal connections?
The slides include formulae for computing the correlation coefficient for two datasets, for estimating the correlation coefficient of population data from a random sample, and how to identify when this is statistically significant.
I’ve also included a host of warnings about the dangers of “significance” testing, p-values, and the risks of confusing correlation with causation.
Link: Slides for Lecture 18
Too long? Skip to the comics at the bottom. They’re good.
The Bad News About Significance Testing
John P. A. Ioannidis. Why Most Published Research Findings Are False.
PLoS Medicine 2005 2(8):e124.
David Trafimowa and Michael Marks. Editorial.
Basic and Applied Social Psychology 37(1)1–2, 2015.
This is the announcement that BASP will not accept papers using p-values, significance testing, or confidence intervals.
Allen Downey. Statistical Inference is only Mostly Wrong.
Probably Overthinking It, March 2015.
Blog article responding to BASP ban.
Ella Rhodes. Liberating, or Locking Away our Best Tools?
The Psychologist, 2015.
Commentary on the BASP ban, reporting reactions from a range of parties.
Correlation may not be Causation
Tobacco Use and Academic Achievement. CDC: Centers for Disease Control.
Health-Risk Behaviors and Academic Achievement. CDC with statistics relating academic achievement to just everything: physical exercise, alcohol, sex, watching television, and carrying a weapon. Who knew?
Smoking Gun, Jean Marston, March 2008.
Letter to New Scientist recalling Fisher’s sceptical response to Doll and Bradford Hill’s work connecting smoking and lung cancer.
John Aldrich. A Guide to R. A. Fisher.
Kill or Cure? Documenting the Daily Mail’s achievements in ontological oncology.
Spurious Correlations. Data dredging at its best.
Wikipedia on the “German Tank Problem”.
Ruggles and Brodie. An Empirical Approach to Economic Intelligence in World War II, Journal of the American Statistical Association, 42(237):72–91, March 1947.
This is my source for the tank counting data, on page 89.