Category Archives: Bad science

Calcium and heart attacks – an example of bad science

Preamble: the following post contains discussion of terms that aren’t commonly used in every day language.  I’ve attempted to explain them in the glossary, but some are probably still as clear as mud.  Please ask questions if something’s totally confusing, or give me suggestions on how to make the concepts clearer. And with that, on with the story…

I talked briefly last week about how abstracts of scientific papers can be misleading, so we need to read the whole paper in order to critically appraise it. Sadly, sometimes when you go through that critical appraisal process you find that the claims made in the abstract just don’t hold up. Critical appraisal is a broad topic with lots of components, so in this post I’m just going to focus on one aspect: do the numbers say what the authors claim that they say?

I’m going to take as my example a paper that I’ve read recently that actually really irritated me. Conveniently it’s open access, so anyone who fancies can go and read: “Calcium supplements with or without vitamin D and risk of cardiovascular events: reanalysis of the Women’s Health Initiative limited access dataset and meta-analysis”. As the title suggests, this paper has taken some large datasets and examined whether people who take calcium supplements are at higher risk of heart attacks and strokes. The conclusion of the abstract tells us:

Calcium supplements with or without vitamin D modestly increase the risk of cardiovascular events, especially myocardial infarction, a finding obscured in the WHI CaD Study by the widespread use of personal calcium supplements. A reassessment of the role of calcium supplements in osteoporosis management is warranted.

So how did they reach that conclusion? Let’s take a look. The data for this analysis comes from the calcium and vitamin D trial of the Women’s Health Initiative (also known as the WHI), which was designed to see if giving those supplements to 36,282 women who had been through the menopause would reduce their risk of hip fracture (they do, but the reduction’s not huge, and we can’t be sure that it’s not just due to chance). They later re-analysed the data, and found that after seven years the supplements had no effect on the risk of heart attacks or strokes. So far so good.

Bolland et al, the authors of the paper we’re looking at, had some concerns about this analysis – it turns out that just over half of the participants were taking their own calcium supplements. “Hang on!”, you might say, “how can we tell what effect calcium has, if some are them are taking their own calcium AND the stuff the study doctors gave them?!”. And you’d be right – this is potentially a big confounder.

So Bolland and chums took the WHI data, separated it according to whether the women were taking their own calcium or not, and then separated it again by whether they were then given study calcium or a placebo sugar pill. Confused yet? Me too, and we haven’t even really started. To simplify it slightly, here are the four groups of women we’ve ended up with:

A. Women who only took WHI calcium

B. Women who took nothing at all except placebo

C. Women who took their own calcium AND WHI calcium

D. Women who only took their own calcium (plus a placebo)

They then looked at lots of different outcomes or endpoints in these women. It’s normal practice in a trial to have one “primary outcome”, and maybe a few secondary ones. In this case, we have nine:

  1. Clinical heart attacks (that is, heart attacks where the person had symptoms, sought medical attention, and received treatment).
  2. Total heart attacks (includes all of the heart attacks in the group above, but also adds in ones that were only detected later by changes seen on ECG tests)
  3. Revascularisation (people who had a coronary artery bypass graft, or other procedure to promote healthy blood flow to the heart muscle)
  4. Stroke
  5. Combination of total heart attacks plus all deaths from coronary heart disease
  6. Combination of clinical heart attacks plus revascularisations.
  7. Combination of clinical heart attacks plus strokes
  8. Combination of total heart attacks, plus all deaths from coronary heart disease, plus revascularisations.
  9. Death from any cause

Seem a bit over the top? That’s because it is. There are very good reasons we normally only choose one primary endpoint for a trial, and maybe 3 or 4 secondary ones. The first is so that the authors can do some sums ahead of time, and work out how many people they need to enrol to answer the question properly. This is referred to as “statistical power” and it’s an important topic, but it needs its own blog post to do it justice. I’ll get to it one of these days. In any case Bolland et al had no control over how many people were enrolled, as they were using someone else’s data.

The second reason applies though, and it’s this. When a scientist does the sums at the end of a trial to figure out the results, usually they will include what’s known as a p value. The p stands for probability, and the p value very simply represents how likely it is that the results of the trial happened by chance. Any time the p value is less than or equal to 0.05 (the same as 5%), we say the results are statistically significant; that is, they’re likely to be accurate, and not due to chance.

That was a rather quick and simplistic explanation, but the one thing you need to take away from it is this: if you ever see p = 0.05 written down, there is a 95% chance that that the results you’re reading are accurate, and a 5% risk that they are due to chance, and are wrong. Put another way, that is a one in twenty risk of an incorrect result.

And we know that so far, Bolland et al have nine endpoints. Except…they don’t. As outlined above, they split the women in to four groups, which I’ve called A, B, C & D for simplicity. Then they compared groups A and B for each of the nine endpoints, and they compared groups C and D for each of the nine endpoints. So actually, there are eighteen comparisons here.

For each comparison, a hazard ratio is reported. Simply speaking, in this case, the hazard ratio represent how likely a person in group A is to have a heart attack (for example) compared to someone in group B, after a certain amount of time. For example, the hazard ratio for heart attacks in group A vs. group B was 1.22. We know that the women took supplements for seven years, so the hazard ratio tells us that for these post menopausal women, seven years of calcium and vitamin D supplements makes you 1.22 times (or 22%) more likely to have a heart attack. Or does it? The next section tells you…maybe not.

A confidence interval is also reported for each result, which is a useful partner to the p value.  For the example hazard ratio above, the 95% confidence interval was 1.00 to 1.50, and that’s very interesting, because  a hazard ratio of one means there’s no difference.  Because the 95% confidence interval includes one as one of the possible values, that tells us that maybe this effect isn’t as big as we thought.

Right, enough pre-amble; what did they find?  This post is quite long enough, so I’ll save that part for next time.