If you have worked with data, then I bet you have been guilty of one or more of some kind of statistical fallacy at some point. I know I have!
In this series, we will be looking at fallacies that often come up when analysing data or, allegedly, academic sources.
This is part two. You can find part one here.
A paradox? How is this a fallacy? I thought that we were talking about fallacies?
We are, but we must introduce this paradox first.
Simpson’s Paradox is a phenomenon where a trend appears in different groups of data but vanishes or reverses when those groups are combined.
The classic example of this is a study performed in Berkley University in the 1970s over the following data.
Note that the last row shows the total application success rates for both genders. If you look at this data alone, it might seem to suggest that the application success rate for men is higher than that of women.
This led to Berkeley being accused of sexism. But is it as simple as this suggests?
If we look at the data, we notice that 1,800 women applied for subject A. Only 168 men applied for that same subject.
Of those 1,800 women that applied for subject A, 15% of them were approved. While 14% of the men that applied for subject A were approved.
This is a slightly better result for women. When it comes to applications for subject A it seems that Berkeley was not being sexist.
Let us look at subject B now. For this subject, 50% of men that applied were approved and 51% of women were approved. Again, this is a slightly better result for women and it is hard to argue that Berkeley was sexist.
How then, to explain the fact that women seemed to have a lower overall success rate?
Let us consider that subject A seems to have a lower approval rate for both genders. It seems Subject A is a competitive subject with very low approval rates for both genders.
Now, out of 2,000 applications for women, 1800 of those were to subject A but only 270 of these were approved. That is, 13% of these applications were applications for subject A.
For men, out of 2,000 applications for men, 1200 of these were to subject A but only 168 of those were approved. That is, 8.4% of these applications were applications for subject A.
Note that 1800 out of 2,000 applications made by women were for subject A. That is 90% of applications. Whereas for men, only 60% of those applications were for subject A.
A significantly higher proportion of women were applying for subject A, the subject with a much lower approval rating. And most of those applications were going to be rejected.
So, it stands to reason that a higher number of women would have their applications rejected.
So, far from Berkeley being sexist, the real reason women had a lower application success rate is that they tended to make more applications to subjects with a lower application success rate.
Let us see what happened here. If we look at the data for subject A and B, we see that men and women have about the same chance of being rejected for each subject. With subject A having a much higher chance of rejection for both genders.
But if you combine the subject rejection rates for the genders, you get a 28% rejection rating for men and 19% for women.
This seems to suggest that women are rejected more often than men, even before you did this, this trend did not show itself.
This is Simpson’s Paradox at play. It is the statistical phenomenon where trends disappear or reverse when you combine data.
In this case, women had a slight advantage in application rates if you do not account for the proportions of women applying for subject A and add the acceptance rates per subject together for each gender.
If you do this, the trend reverses and women have a lower application success rate!
The fallacy would be failing to recognize why the trend seems to reverse and assigning some erroneous cause.
The reason we observe the lower application success rate for women is not sexism, but the fact that a higher proportion of women are applying for a more difficult subject.
When you see Simpson’s Paradox you should study the data and try to identify the cause for this paradoxical disappearance or reversal of trends. Not simply assume some erroneous cause.
Try to avoid the fallacy of misinterpreting trends in the data.
Let us take one more example before we move on.
Suppose we have two baseball players, Joe and Martin. During the years 2019 and 2020 we have the following data:
Note that in both 2019 and 2020, Martin had higher batting averages. However, when you combine these years, Joe has a higher batting average.
What gives? This is caused by the fact that Joe had a lower batting average for both of these years but a lot more time at-bat, meaning that when you combine the data he has a slightly higher batting average.
You could assume that the data was rigged or that maybe Martin was a better batter after all. But that would be a fallacy.
The real cause of the fact that the combined totals being better for Joe is more to do with the fact he spent more time at-bat.
If you see trends vanish or reverse when data is combined, always look more carefully at the data and see why this might be the case. You are likely to find that there is a perfectly logical reason this happens that has more to do with the data than anything else you might erroneously assume.