Notes on The Book of Why (Chapter 2)
This post is a collection of notes / thoughts I'm having while reading Chapter 2 of The Book of Why.
I love this description of the Friday Evening Discourse at the Royal Institution of Great Britain in London. Let's embrace science as theater!
The Central Limit Theorem
The central limit theorem is described in the book. Wikipedia has a more formal description:
This is followed immediately by a more easily understood explanation:
Hmm, how can I map this explanation back to the example in the book with the Galton board? Each marble has some chance of landing in each of the 10 slots along the bottom. But each slot doesn't have equal probability. So, what's the probability of a marble landing in the i
th slot? This question is answered by something called the binomial distribution:
In this case, the galton board sends each marble through a sequence of n-1 "trials". Each trial has two outcomes: Did the ball go left (failure) or right (success)? There are 10 possible outcomes: 0 success, 1 success, ... 9 successes.
In a perfectly constructed Galton board, the marble has a 50-50 chance of success at each layer. But binomial distributions aren't just for trials with 50-50 odds.
The binomial distribution describes the probability that a marble will land in each of the 10 slots.
One important thing about the binomial distribution is that it is discrete, not continuous. The graph on the image above intentionally shows dots, not a smooth curve.
The point of the central limit theorem is this:
- Send 100 marbles through the Galton board and compute the mean result.
- Repeat this experiment 1000 times, so you have 1000 means.
- Look at the probability function of those means. (That is, the function describes the probability of calculating a mean in any particular range.)
- That function will better approximate the normal distribution as you increase the number of marbles and the number of experiments.
Why is this so important? Because it means if we come up with some method of inference that works with normal distributions, there's usually some trick to make that method useful on a boatload of other problems involving other types of distributions. In other words, it saves us all a bunch of work! (This is especially important when you're doing all this work by hand because electronic computers haven't been invented yet.)
Non-Causal explanation for regression to the mean
Tall fathers tend to have tall sons, but generally speaking the sons are not as tall as the fathers.
Tall sons tend to have tall fathers, but generally speaking the fathers are not as tall as the sons.
This graph illustrates why this is a principle of statistics, not a causal law of physics.
The discussion proceeds to explain correlation coefficients, which I was fascinated by the other day when I learned how to calculate and display the correlation coefficients between the features in (my processed version of ) the titanic dataset:
Protip: If you want to survive the Titanic, be a rich woman.
Bringing causation back
Galton originally wanted to explain the stability of the population despite Darwinian variations from one generation to the next, but abandoned his quest. Now, Why revisits the problem.
Causal diagrams clearly show that Galton's causal
model
was wrong, because it allowed luck (which should be independently applied to each generation) to accrue between generations:
Later, Pearson followed the work of Galton. Despite being wrong about causation, I love the description of how Pearson felt while reading Galton. Let's embrace this excitement for science and knowledge!
The next section of the book is quite fascinating. It is a history of Pearson and other people that came after Galton. I won't include so many screenshots here but if you're curious about the the book I can (so far) recommend getting it and reading it because it's been quite entertaining along the way and not at all dry.
We can all agree that this drawing is amazing.
Wow, the first causal diagram ever published:
Unrelated, I liked this quote.
Ever since the description of an "inference engine" in Chapter 1, I have been wondering how we go about coming up with the "causal model". Finally, I think I am seeing some explanation for how you'd go about to that, and how you'd find out when your causal model was wrong:
Here is another clarifying point. The goal of causal analysis is not to prove that X is a cause of Y or else to find the cause of Y from scratch. (This is what I thought the author was saying it was, so I'm glad that he clarified this point.) That problem is called causal discovery.
Instead, this book is about "representing plausible causal knowledge in mathematical language" and (with data) "answering causal queries that are of practical value."
In other words, you actually need to have a theory / explanation / causal model in order to use causal inference.
More
This whole page is amazing to me. I didn't know what Bayesian statistics were.
The world is fuzzier and more difficult than I thought. Improving our subjective reasoning is necessary for us to get closer to our objective reality.