Testing Testosterone Is A Waste Of Time

Delineating gender in sport is not that easy

How we choose who competes in women’s sport is a difficult question that we’ve yet to answer well

You may have seen the screaming headlines about testosterone. Journalists from around the world are telling the same story: a new study has found, once and for all, that women who have higher levels of testosterone are better at sport.

As usual, it’s a bit more complicated than that.

Gender is complicated in elite sport. We’ve decided arbitrarily that the biggest biological advantage one person can have over another is being a man, and separated elite sport into men versus men and women versus women.

Unfortunately, gender isn’t that easily definable.

How do we decide what makes someone eligible to compete in women's sport?

The International Association of Athletics Federations is the governing body that lays down the law. They used to line up female athletes in what can only be described as an unnerving display of misogyny and force them to strip naked. When they realized that forcing girls as young as 13 to strip down to confirm that they did indeed have vaginas was a messed up — not to mention inaccurate — way to test ‘femininity’, they started using chromosomal tests that gave way in 2011 to testosterone testing.

The argument goes that testosterone is good because it increases things like lean muscle, and so women who have more of it have an unfair advantage: they are more ‘manly’*. It’s also better at delineating who should be allowed to compete in women’s sport, because whilst some competitors are born with extra chromosomes or ambiguous genitalia, ‘male’ testosterone levels should give women an upper hand.


It’s a big advantage: 4/5 beard competitions are won by men

Not exactly.

In 2015 the idea that testosterone levels could be used to delineate who could compete in women’s athletics was successfully challenged in the Court of Arbitration for Sport. They ruled that there wasn’t sufficient evidence that having a ‘high’ level of testosterone gave women a significant enough advantage that it would be unfair to allow them to compete.

Basically, whilst we know that testosterone makes athletes better at performing, there’s no evidence that women who naturally have high levels of the hormone — a condition called hyperandrogenism — are unfairly better at sport. More importantly, there’s no line you can draw between women who have ‘high’ testosterone and women who have ‘normal levels’ because what constitutes ‘high’ and ‘normal’ is very much open to debate.

Pictured: ‘High’ that is not open to debate

The CAS gave the IAAF 2 years to come up with convincing evidence that high-testosterone women were unfairly better at sport than those of a normal range. The study all those journalists are referring to is their effort to do just that.

Unfortunately, it doesn’t show very much of anything at all.

The study itself is pretty basic: the researchers took 2100 athletes, 1300 female and 800 male, divided them up by their levels of testosterone, gender, and athletic event (i.e. 100m sprint). This left them with a total of 132 groups: there were 43 athletic events between the male and female athletics, and 3 testosterone levels in each (low, medium and high).

They then did a number of statistical tests. Firstly, they checked to make sure that there was no significant effect of athletic event on testosterone, as this could confuse their results — if women who threw javelins had higher testosterone levels than women who ran long distance, you couldn’t use the same test for all of them.

Then they compared the athletic results for each individual event between the low group and the high group. They found no significant differences in the men’s sports, but for the women’s sports they found 5 events in which athletes with high levels of testosterone had a statistically significant advantage: The 400m sprint, 400m high-jump, 800m sprint, hammer throw, and pole vault.

And from this, you have thousands of stories about how testosterone is a significant boost for female athletes.

If you hadn’t guessed, these findings probably don’t mean that at all.

There’s an old adage that doing lots of tests in statistics is a bit like flipping a coin: if you do it enough times, you’ll eventually see heads.

Pictured: statistics

What this means is that running statistical tests is about probability. When a paper reports that a test is significant, what they mean is that there is a low chance that this finding could’ve happened through pure happenstance. We usually set the bar of “more likely than luck” at 5%, meaning that there is about a 95% chance that this is a true result. For a more in-depth explanation, read this blog post.

But what happens when you run lots of tests? If the bar is set at 5%, running lots of tests pretty much guarantees that you will come up with a few ‘significant’ results, just like flipping a coin lots of times pretty much guarantees a heads.

In this case, you usually do a neat statistical trick called ‘adjustment for multiple comparisons’ to check whether the significance is real or more likely just an artifact of chance.

Guess what the IAAF paper didn’t do?

You don’t have to guess, because they said it in the paper

They didn’t adjust their analyses.

What this means is that we have a problem: there were 43 coins ‘flipped’ in the paper, and 5 significant results, but there’s a good chance that this was just down to luck.

Here are all the tests they ran. Have a look and spot the ‘significant’ ones

So I went through the paper and ran a really common statistical test known as a Bonferroni Correction. What this basically does is raise the bar for statistical significance according to the number of tests that had been done.

According to these results, none of them are actually significant.

So we know that these tests probably weren’t significant. Something more important to ask is why these 5 tests matter anyway?

If you look at the table above, you’ll notice that there are 16 insignificant results, many of which had women with low levels beating women with high levels. From these results we could just as easily conclude that, for the majority of athletic events, testosterone levels had no impact on performance whatsoever.

It’s definitely what I would say.

And this brings us back to the media stories. I definitely don’t blame the journalists here. Bonferroni corrections and multiple comparisons require a bit of statistical knowledge, and honestly I had to ask for help to make sure I wasn’t just seeing things.

What it does mean is that every headline got it wrong.

All of them.

It may be true that naturally high testosterone gives women an unfair advantage, but this research definitely doesn’t demonstrate that. All it really shows is that, if you do enough tests, you’ll find something that supports your argument.

Which brings us back to what these results actually mean. Because they are going to be used to inform important decisions like who is allowed to compete, and who has their career destroyed because of a test.

Ultimately, it’s just another in a long line of failed tests that try to mandate who is allowed to be a woman in sport.

Maybe it should be the last one.

If you enjoyed, take a second to click on the heart below to let me know! You can also follow me here on medium, on twitter, or on facebook. If you didn’t enjoy, I’d love to know why in the comments, or @ me on twitter!

*This is a stupid argument. We could just as well argue that you should not allow women over six foot four to play basketball, because they have a man’s height.