The improbable fallacy

TL;DR: probabilities are so darn counter-intuitive, that one should not rely on “99% accuracies”.

Imagine you seek out your doctor because of a chronic case of hypochondria. All tests covered by your insurance turn out negative and your doctor tells you to man up and live with it. Being a House M.D. fan you refuse to give up before bankrupting your family finances, so you volunteer your spouse to pay for a further batch of tests for rare diseases from their pocket. One of those tests turns up positive for spontaneous dentohydroplosion which affects about 0.2% of the population. Your doctor also tells you that the test has an accuracy of 98%.

How likely is it that you are sick?

With a test accuracy of 98% your initial hunch would probably be that you almost certainly suffer from spontaneous dentohydroplosion. But let’s run the numbers against Bayes’ theorem.

An explanation of the terms that appear in Bayes’ theorem follows.

D is the event of you having the disease. T is the event of the test turning out positive. P(x) is the probability of an event x and takes real values between 0 and 1. D|T is the event of you having the disease after the test was positive. T|D is the event of the test turning out positive assuming that you have the disease.

P(D) is the probability that you have the disease before you went to the doctor.
P(T) is the probability that the test would turn out positive, irrespective of whether you have the disease or not.
P(D|T) is the probability that you have the disease after the test was positive.
P(T|D) is the probability that your test is positive if you had the disease.

Let’s try to compute those probabilities.

P(D) = 0.002 because 0.2% of the population have the disease, most of them unknowingly. So that would be your chance of having the disease before taking the test.

P(T) = 0.98*0.002 + (1-0.98)*(1-0.002) = 0,00196 + 0,01996 = 0,02192. This requires a bit elaborating: the chance that the test is positive is the chance that you have the disease (0.002) multiplied by the test’s accuracy and the chance that you don’t have the disease (1-0.002) multiplied by the chance of the test being inaccurate (1-0.98).

P(T|D) = 0.98 because that is the test’s accuracy.

Plugging in those numbers into Bayes’ theorem we get
P(D|T) = 0,089416058

So you have a 9% chance of actually suffering from spontaneous dentohydroplosion.

Discussion

The implications of an almost perfect test being so wrong could be severe. So how can this be? The intuition goes about it like so: the test has a 2% margin of error. If we randomly tested a sample of 1000 people in the general population the test would flag 20 people as patients although, based on statistical observation, we’d expect only 2 cases. This means that if the test is positive for you, it is probably due to the test error margin and not to you actually having the disease 😵

In reality, and since this is a rather serious occasion, you should get a second opinion as the chance of both tests being wrong (assuming that tests are statistically independent) is much lower.