Breast Cancer Risk

Jun 03, 2021

Last week, a doctor informed my friend V that she had a 47% chance of developing breast cancer.

Scary news. V and I were quite concerned.

At the same time, I wondered: Where did this 47% figure come from? What exactly does it mean to say that V – or anyone else – has a 47% chance of developing cancer (or any other disease)? What does one do with that information? (And, is the situation as bad as it sounds?)

As I mentioned in my first newsletter, statistics as we know it is a recent invention, and it deserves part of the credit for unprecedented advances in various fields. Thanks to statistical analyses, for example, we know that about 1 in 8 American women will develop breast cancer in her lifetime – a clear message to women and health care providers about the importance of screening, self-detection, and choosing a lifestyle that minimizes risk (see here for details).

At the same time, as I will often point out, statistics are easily misrepresented. Sometimes inadvertently, sometimes with malice. In V's case, the problem is one that frequently happens when statistical results are quoted out of context. Let's take a closer look at her situation.

V had scheduled an appointment with her doctor to examine a lump she noticed in one breast. The biopsy revealed a precancerous condition known as atypical hyperplasia, in which normal-looking cells are overproduced and take on unusual shapes and patterns. V had three such hyperplasias in one area of her breast.

Although I didn't talk with V's doctor, an influential 2015 article published in the New England Journal of Medicine (NEJM) appears to explain the 47% figure.

Apparently, V's doctor knew about the NEJM article, because one of the findings it reports was that 47% of women with three or more atypical hyperplasic lesions developed breast cancer within a 25-year period. (If you're a stats nerd, you know now that that V's "chances" pertain to risk as opposed to odds.)

When I found this statistic in a published study, I assumed I'd end up telling V one of two things:

(a) The methodology of the study seems to be strong, so our concerns are well-founded, or

(b) The methodology was weak, so there's less cause for concern, (though you should still monitor your breast health more closely now, as per your doctor's advice).

Instead I ended up telling V this:

How did I arrive at (c)? And what was the rest of my take-home message?

I'm not a medical researcher, but I do know that NEJM is a prestigious journal, and I found no obvious methodological flaws in the research. However, it's easy to spot a (relatively) superficial problem with that 47% figure, as well as one deeper concern.

The superficial problem is that it's extremely unlikely that exactly 47% of the women in question will develop breast cancer within 25 years. Although the sample size of this study was large (over 600 women), a different, comparably large sample would've almost certainly yielded a different percentage. Perhaps 48%. Perhaps 45%. We have no way of telling. (Those of you with statistical expertise know that a confidence interval obtained from a sample could only tell you something about a likely as opposed to actual range of possible values.)

Although this might be considered a superficial problem when methodology is strong, it's worth mentioning because it illustrates two common misuses of statistical data.

One is overgeneralization. As I mentioned, a different sample would've yielded a different percentage. The other problem is referred to sometimes as "false precision". Because statistical findings are expressed numerically, and because numbers are precise (47 isn't the same as 41 or 47.009), it's easy to treat statistical values as indicating a greater deal of precision than the evidence warrants.

(Please note: I'm not blaming the researchers or the doctor for promoting false precision. I'm just saying that once that 47% figure is aired, it's easy for us to make too much of its preciseness.)

The deeper problem with the 47% figure is that it doesn't take into account all the differences between women in variables that are known to predict breast cancer. These include variables women can't control (e.g., genetics) as well as those they can (e.g., lifestyle). And we should ask: Does it matter exactly how many atypical hyperplasias were detected? V had exactly three. The 47% figure refers to "three or more".

My point here is that saying V has a 47% chance of developing cancer is kind of like saying that you have X% chance of being robbed at gunpoint if you live in New York City. Your actual chances may be a lot higher or lower than X% depending on your line of work, which part of the city you live in, where you spend your time, etc. This illustrates a further problem of applying group data to an individual case.

Unfortunately, the NEJM article doesn't disaggregate the data in a way allows me to refine that 47% figure, given what I know about V's diagnosis, demographics, family, health, and lifestyle.

So, what's the take-home message? What could I tell my friend?

Well, further digging turned up some good news. A little complicated, but good. Fasten your seat belts...

The National Cancer Institute provides a highly respected assessment on its website called The Breast Cancer Risk Assessment Tool (BRCAT). By clicking on the "Assess Patient Risk" button, a woman can calculate in less than a minute her lifetime risk of developing breast cancer. Among other things, the BCRAT factors in whether or not the woman has had an atypical hyperplasia.

Using the BCRAT, I found that compared to other women of V's particular demographic, V has an 18.5% chance of developing breast cancer in her lifetime (as compared to 7.2% for her peers.)

Wow. One source says that V has a 47% chance of developing breast cancer in the next 25 years, while the other says she has an 18.5% chance of developing breast cancer across her entire lifetime. What a difference!

(Only a small amount of the discrepancy between these figures can be explained by factoring in the number of atypical hyperplasias revealed by V's biopsy, a variable the BRCAT doesn't consider. For example, in the NEJM article, if the number of atypical hyperplasias is discounted, as in the BRCAT, a 30% risk is found for a 25-year period, which is still much higher than 18.5% across one's entire lifetime.)

The authors of the NEJM article claim that the BRCAT underestimates the risk of breast cancer among women with atypical hyperplasias. However, in V’s case there are good reasons for not ignoring the BRCAT's prediction in favor of the 47% figure.

The BRCAT and the NEJM article use overlapping but not identical variables as inputs. Each incorporates variables not found in the other (e.g, only the BRCAT considers race; only the NEJM article distinguishes among hyperplasia subtypes). And, the BRCAT treats age as a continuous variable, while in the NEJM article age is trichotomized (<45, 45-55, >55). So, no surprise that their predictions don’t line up perfectly. The question then is what to believe. Do we trust the BRCAT results, the NEJM data, or neither?

Although the sample in the NEJM article was large, the numbers diminish as the data are disaggregated, and disaggregation is only reported one variable at a time. For example, there were only 113 women with three or more atypical hyperplasias, and we don’t know how many of them exhibited other risk factors such as smoking or a family history of breast cancer. (Even if we did know, we wouldn't have much confidence in the findings, given the relatively small sizes of these more finely disaggregated subgroups.) So, I'm already questioning how well the 47% figure applies to V.

Here are two more considerations: Both the BRCAT and the NEJM article (a) rely on data that extend back more than half a century, and (b) do not include lifestyle variables known to be associated with breast cancer. Both considerations are good news for V.

With respect to (a), screening has become more accurate and more prevalent in recent years, and so, moving forward, we can predict declining rates of cancer following a problematic biopsy, under the assumption that precancerous conditions now tend to be detected earlier.

With respect to (b), V doesn't have obesity or other relevant health problems, and she leads a healthy lifestyle (moderate exercise; moderately healthy diet; no alcohol or drugs), all of which place her among the least likely to develop cancer in any subgroup. (Here's a concrete example: V is a non-smoker. However, if she smoked, her risk of developing breast cancer would be 14% to 28% greater depending on variables such as the age at which she started smoking (see here for details).

In sum, the 47% figure seems to be an overestimate in V's case. The BRCAT results agree with me (although given that this tool doesn't include all relevant disaggregations, I wouldn't conclude that the 18.5% figure is perfectly accurate either.)

Here, then, is my take-home message: Less than a 47% risk. How much less is impossible to know. My suggestion to V was to set aside the stats and focus on managing what can be managed – following her doctor's advice about increased monitoring, and maybe tweaking the lifestyle (a little more exercise, a little less fatty food, and, of course, seeking out ways to reduce stress). We're keeping our fingers crossed...

Thanks for reading!

Statisfied

Breast Cancer Risk