Notes on The Book of Why (Chapter 2)

screenshot

This post is a collection of notes / thoughts I'm having while reading Chapter 2 of The Book of Why.

I love this description of the Friday Evening Discourse at the Royal Institution of Great Britain in London. Let's embrace science as theater!

screenshot

The Central Limit Theorem

The central limit theorem is described in the book. Wikipedia has a more formal description:

screenshot

This is followed immediately by a more easily understood explanation:

screenshot

Hmm, how can I map this explanation back to the example in the book with the Galton board? Each marble has some chance of landing in each of the 10 slots along the bottom. But each slot doesn't have equal probability. So, what's the probability of a marble landing in the ith slot? This question is answered by something called the binomial distribution:

screenshot

In this case, the galton board sends each marble through a sequence of n-1 "trials". Each trial has two outcomes: Did the ball go left (failure) or right (success)? There are 10 possible outcomes: 0 success, 1 success, ... 9 successes.

In a perfectly constructed Galton board, the marble has a 50-50 chance of success at each layer. But binomial distributions aren't just for trials with 50-50 odds.

screenshot

The binomial distribution describes the probability that a marble will land in each of the 10 slots.

One important thing about the binomial distribution is that it is discrete, not continuous. The graph on the image above intentionally shows dots, not a smooth curve.

The point of the central limit theorem is this:

Send 100 marbles through the Galton board and compute the mean result.
Repeat this experiment 1000 times, so you have 1000 means.
Look at the probability function of those means. (That is, the function describes the probability of calculating a mean in any particular range.)
That function will better approximate the normal distribution as you increase the number of marbles and the number of experiments.

Why is this so important? Because it means if we come up with some method of inference that works with normal distributions, there's usually some trick to make that method useful on a boatload of other problems involving other types of distributions. In other words, it saves us all a bunch of work! (This is especially important when you're doing all this work by hand because electronic computers haven't been invented yet.)

screenshot

Non-Causal explanation for regression to the mean

Tall fathers tend to have tall sons, but generally speaking the sons are not as tall as the fathers.

Tall sons tend to have tall fathers, but generally speaking the fathers are not as tall as the sons.

This graph illustrates why this is a principle of statistics, not a causal law of physics.

screenshot

The discussion proceeds to explain correlation coefficients, which I was fascinated by the other day when I learned how to calculate and display the correlation coefficients between the features in (my processed version of ) the titanic dataset:

screenshot

Protip: If you want to survive the Titanic, be a rich woman.

Bringing causation back

Galton originally wanted to explain the stability of the population despite Darwinian variations from one generation to the next, but abandoned his quest. Now, Why revisits the problem.

Causal diagrams clearly show that Galton's causal model was wrong, because it allowed luck (which should be independently applied to each generation) to accrue between generations:

screenshot

Later, Pearson followed the work of Galton. Despite being wrong about causation, I love the description of how Pearson felt while reading Galton. Let's embrace this excitement for science and knowledge!

screenshot

The next section of the book is quite fascinating. It is a history of Pearson and other people that came after Galton. I won't include so many screenshots here but if you're curious about the the book I can (so far) recommend getting it and reading it because it's been quite entertaining along the way and not at all dry.

We can all agree that this drawing is amazing.

screenshot

Wow, the first causal diagram ever published:

screenshot

Unrelated, I liked this quote.

screenshot

Ever since the description of an "inference engine" in Chapter 1, I have been wondering how we go about coming up with the "causal model". Finally, I think I am seeing some explanation for how you'd go about to that, and how you'd find out when your causal model was wrong:

screenshot

Here is another clarifying point. The goal of causal analysis is not to prove that X is a cause of Y or else to find the cause of Y from scratch. (This is what I thought the author was saying it was, so I'm glad that he clarified this point.) That problem is called causal discovery.

Instead, this book is about "representing plausible causal knowledge in mathematical language" and (with data) "answering causal queries that are of practical value."

screenshot

In other words, you actually need to have a theory / explanation / causal model in order to use causal inference.

screenshot

This whole page is amazing to me. I didn't know what Bayesian statistics were.

screenshot

The world is fuzzier and more difficult than I thought. Improving our subjective reasoning is necessary for us to get closer to our objective reality.

screenshot

Notes on The Book of Why (Chapter 2)

The Central Limit Theorem

Non-Causal explanation for regression to the mean

Bringing causation back

More