The Book of Why: The New Science of Cause and Effect

midjourney

Statistics gives us the means to describe data, but not to interpret it. We can describe the lifespan L of patients who take a drug D with (L | D). We can compare it to the lifespan L of patients who don't take the drug (L | ~D). But statistics can't answer whether taking a drug extends the lifespan of patients, or whether a patient who died without the drug might not have died should he have taken the drug.

The Book of Why, by Turing Award Winner Judea Pearl and coauthor Dana Mackenzie describes the science of causal inference. It answers questions that science has long avoided: Why did something happen? Did products sales increase because of a policy change or because of our marketing campaign? Does taking the drug cause an increase in lifespan?

The book promises to overcome a major roadblock to explanatory theory, summarized in the maxim, "correlation does not imply causation".

An intriguing recommendation

As I've mentioned in recent blog posts, I am working my way through the fast.ai course. In chapter 9 of the associated textbook, Deep Learning for Coders with fastai and pytorch, we ask an interesting question about how to determine how much a tractor's YearMade datapoint affects its sale price. The screenshot below explains why you cannot simply take the mean sale price for tractors made in each year:

screenshot

Instead, we have to imagine data and predict sale prices for this imagined data:

screenshot

At first glance, this is a strange thing to do. And while it makes some intuitive sense, I don't know why we should believe this kind of analysis is reliable. However, there's an interesting aside:

screenshot

As much as I don't want to be distracted by going down too many tangential paths along my way to being an effective data science practitioner, I couldn't help but follow my curiousity and start reading this book. And after reading the introduction, I'm glad to have made the trip.

The Introduction

For as long as I can remember, I thought science didn't have methods for answering these kinds of questions:

screenshot

When I took a course on probability & statistics in college, they made clear that statistics doesn't explain cause and effect:

screenshot

Now, this book promises a means to answer them by means of causal diagrams and an algebra-like symbolic language:

screenshot

The lengthy exerpt gave me a first taste of the difference between what the author describes as the difference between "doing" and "seeing". Statistics, which describes data, is concerned with "seeing". And in order to understand causal relationships, we must formalize what is meant by "doing" something. This is a tricky business. It's not enough to compare lifespans of patients who take a drug against the lifespans of patients who don't, because many other variables might confound the result. You must somehow isolate the variable in question (taking the drug) from everything else (a patient's increased willingness to take the drug due to the severity of their condition).

screenshot

This is already starting to remind me of the technique the Deep Learning book recommended to determine how YearMade affects SalePrice. It was not enough to find the mean SalePrice of each YearMade. We had to observe "counter-factuals": What if the YearMade of each of these tractors were different?

Connections with The Beginning of Infinity

Another book that I've been slowly making my way through, mostly concerned with epistemology ("How do we know what we know?"), is The Beginning of Infinity by David Deutsch. At this point of the introduction to The Book of Why, am starting to make connections to the David Deutsch's ideas. They are different, but somehow related.

Consider this excerpt (from Why):

screenshot

These are some of the same words that Deutsch is concerned with in Infinity. (And before Deutsch, Popper.)

In Why, each of these things play a different role in an inference engine, which the authors posit will be the thing that allows AI agents to truly think with human intelligence.

screenshot

Like Infinity, Why distinguishes between "knowledge" and mere "data" or "information". Furthermore, Why places special importance on a causal model, which might be an abstract definition of what Infinity refers to as an "explanation". My guess is that a good causal model, like a good "explanation" will be hard to vary and will have reach.

Being Hard to Vary (Infinity) may be related to how the causal model of the inference engine may need to be revised:

screenshot

Or perhaps it is is related to uncertainty measurements that are now standard and well understood by statisticians:

screenshot

Reach from Infinity becomes Adaptability in Why:

screenshot

Another way that Why and Infinity are similar are that they both posit ways to get ever closer to correct answers. That is, they aim to always be less wrong, with the ultimate goal of being arbitrarily precise.

(Note that always being able to able to be less wrong is not the same as being able to be arbitrarily precise. The first is accomplished by a monotically increasing function that approaches 80% accuracy. The second would mean a function that approaches (but never hits) 100% accuracy. It's unclear to me which of these two worlds Infinity thinks we live in.)

The rest of the book

So far, I've only read the introduction. But I was sufficiently excited by it to want to write down some of the thoughts and capture some screenshots that stood out to me.

Even though I believe we've barely begun to see the transformative effects that AI will have on society, I still feel that I'm "behind" the cutting edge and needing to catch up. So this promise from the summary of chapter 1 motivates me to read on:

screenshot

I'm also especially looking forward to Chapter 6, which looks at classic paradoxes in a new light.

screenshot

The summary of Chapter 8 succinctly answers why counter-factuals are at the heart of what we mean by A "is a cause of" B. When we say "A caused B", we mean something like, "If it weren't for A, B wouldn't have happened" and "If A didn't happen, then neither would B.".

screenshot