Correlation and causation

I wrote this for an audience who isn’t going to appreciate it. So I’ll share it here.

Correlation is simply pattern noticing: “When I observe this. I also observe this.”

The two things can be totally unrelated, the two things can be triggered by the same precursor event or one can be causing the other (note, these are not all the possible states, but we’ll keep it simple for the not-rocket-scientists here ;)). And, well, there’s always observer bias involved. [1}

What the statement correlation does not imply causation means, at the heart of things, is that just because you observe 2 things happening tells you absolutely nothing about the relationship about the two things.

Yes, we observe lots of emails sent by the winning side of an election. We can observe this. It is fact. There is a correlation, in these two elections, between well orchestrated email campaigns and a win.

But the observation alone tells us nothing else. Is it the branding and persuasive effect of the email? Maybe. Is it simply that the campaigns were better funded? Possibly. Is it that the campaigns sending more email were better run? Could be. Is it that the election accurately reflects the will of the voters? Possibly. [2]

That’s what scientists mean when we say correlation does not imply causation. We can observe “these two things seem to act in non-independent ways.” But that doesn’t mean that one causes the other. It could be that they’re totally unrelated. For instance, there is a correlation between the divorce rate in Maine and the per capita consumption of margarine.

It could be that both events are triggered by the same parent event. So it could be that the effectively run campaigns led to effective email usage and led to that side winning the vote. The emails don’t trigger the vote win they’re triggered by the same thing that triggers the win.

And, it could be that the email actually caused the win. To really tell, though, you need to do a LOT more data chugging and controlling for factors that might have affected the outcome of the election. You probably also need to be a statistician to actually do that level of work. You also need to look at more than 2 data points. You especially need to look at elections that were lost and identify how they were using email. But this is a lot squishier science than I did (and I was only a molecular biologist), so I can’t even start to comprehend how you’d collect and analyze the data.

[1] One of the things that was drilled into me as a working scientist is that we see what we want to see. The essence of being a good scientist is looking beyond what you want to see and making sure your own observations are as neutral and as bias free as possible. Or, if they can’t be bias free, design experiments that will negate or reveal your bias. [1a]

[1a] The recent “no such thing as gluten sensitivity” research is a good example. The scientist saw what he expected, but then did further research with much stricter controls and a different study design to see if the results held up. They didn’t.

[2] Here in CA it’s not always the case that the better funded candidate (or referendum) wins. Meg Whitman and Carly Fiorina are two examples of this, but there are also a number of propositions that the lesser-spending-proponents won.


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.