Partially spurious relationships for beginners


    Simen Sørbøe Solbakken
    16th November 2019

    Example of a partially spurious relationship between Education and Income

    Let's conduct a survey and ask 100 fictional people about their Education and Income:

    We see that people with more Education have higher Income than people with less Education. Thus, there is a statistical relationship between Education and Income. If we run a regression analysis on the data above, we get the following regression line:

    The brown regression line above shows that when Education increases by 1, Income increases by .

    Investigating a spurious relationship with scatter plots

    But, wait. Could it be that males have both more Education and higher Income? Let's ask our 100 people about their Gender too:

    It's clear from the scatter plot above that males have both more Education and higher Income. Thus, the statistical relationship of between Education and Income is a partially spurious relationship. Education could still be important to people's Income. However, parts of the effect is caused by Gender.

    Partially spurious relationships are partially false statistical relationships. And the statistical relationship of between Education and Income is such a partially spurious relationship, caused by the third variable Gender which affects both Education and Income.

    Remove the spurious (false) part of a statistical relationship with statistics

    How can we remove the spurious (false) part of the statistical relationship between Education and Income using statistics? Let's do a regression analysis, including all three variables Gender, Education and Income:

    The brown regression line is from our first regression analysis, including Education and Income only. The purple regression line shows the effect of Education on Income from the new regression analysis, where all three variables Gender, Education and Income are included.

    The purple regression line above shows that when Education increases by 1, Income increases by . In other words, when we include Gender in the regression analysis, the effect of Education on Income is reduced from to . This shows that the statistical relationship of between Education and Income is a partially spurious relationship.

    Based on this purple regression line, we conclude that when Education increases by 1, Income increases by , controlled for Gender.


    By removing the spurious (false) parts of statistical relationships, Easystat makes it as easy as possible to analyze partially spurious relationships. Using statistical expert knowledge and artificial intelligence, Easystat automatically selects the best statistical methods for you.

    No registration required