Spurious relationships for beginners


Simen Sørbøe Solbakken
16th November 2019

Example of a spurious relationship between Hair Length and Number of Diamond Rings

Let's conduct a survey and ask 100 fictional people about their Hair Length and Number of Diamond Rings:

We see that people with longer Hair Length have a higher Number of Diamond Rings than people with shorter Hair Length. Thus, there is a statistical relationship between Hair Length and Number of Diamond Rings. If we run a regression analysis on the data above, we get the following regression line:

Based on our survey, we conclude that Hair Length causes people to have higher Number of Diamond Rings. Let's grow longer hair!

Investigating a spurious relationship with scatter plots

But, wait. Something is wrong with this conclusion. Does really Hair Length causes people to have higher Number of Diamond Rings? Or could it be that females have both longer Hair Length and higher Number of Diamond Rings? Let's ask our 100 people about their Gender too:

It's clear from the scatter plot above that females have both longer Hair Length and a higher Number of Diamond Rings. Thus, Hair Length probably doesn't cause people to have higher Number of Diamond Rings. It's all about Gender!

Since Gender has a statistical relationship with both Hair Length and Number of Diamond Rings, we are fooled to conclude that longer Hair Length causes people to have higher Number of Diamond Rings. Thus, it's a spurious relationship between Hair Length and Number of Diamond Rings.

Spurious relationships are false statistical relationships which fool us. And the statistical relationship between Hair Length and Number of Diamond Rings is such a spurious relationship, caused by the third variable Gender which affects both Hair Length and Number of Diamond Rings.

Identify a spurious relationship with statistics

How can we tell if the statistical relationship between Hair Length and Number of Diamond Rings is a spurious relationships using statistics? Let's do a regression analysis, including all three variables Gender, Hair Length and Number of Diamond Rings:

The brown regression line is from our first regression analysis, including Hair Length and Number of Diamond Rings only. The purple regression line shows the effect of Hair Length on Number of Diamond Rings from the new regression analysis, where all three variables Gender, Hair Length and Number of Diamond Rings are included.

The new purple regression line above clearly shows that Hair Length doesn't affect Number of Diamond Rings at all. Let's show the effect of Gender on Number of Diamond Rings from the same regression analysis:

Based on these two purple regression lines, we conclude that Hair Length doesn't affect Number of Diamond Rings at all. The statistical relationship between Hair Length and Number of Diamond Rings is a spurious relationship. The real explaination of people having different Number of Diamond Rings is Gender.

Omitted-variable bias and regression analysis

We can only tell if a statistical relationship is a spurious relationship if we include those variables causing the spurious relationship in the regression analysis. If we concluded, based on the brown regression line above where Gender was omitted, we would never got to know that the statistical relationship between Hair Length and Number of Diamond Rings was a spurious relationship.

To be 100% sure that a statistical relationship between a Variable A and a Variable B is not really a spurious relationship, we have to include all possible variables affecting both variables in our regression analysis. Failing to do so will make us conclude that Variable A affects Variable B, while it's really a spurious relationship caused by the omitted Variable C. Thus, spurious relationships are also known as omitted-variable bias.


Similar terms and concepts
Similar terms and concepts to spurious relationships
  • False correlation
  • Omitted-variable bias

Easystat makes it as easy as possible to analyze whether a statistical relationship is really a spurious relationship. Using statistical expert knowledge and artificial intelligence, Easystat automatically selects the best statistical methods for you.

No registration required