Thursday, August 2, 2012

The Guardian's Statistical Analysis of Ye Shiwen's 400IM

The Guardian has a fantastic feature which dives into the data on Ye Shiwen's gold medal-winning 400IM at London 2012. The race that has prompted unending debate and controversy, mainly because Ye swam the final leg of the race faster than her male counterpart, Ryan Lochte of the US.

Just how rare is that feat?  The Guardian does some impressive data analysis and concludes:
Looking at the results from the men's and women's 400m individual medley finals from London 2012 and from last year's World Championships in Shanghai, Ye's performance on Saturday is the only occasion where a female has completed either the final 50m or 100m in a faster time than any male finalist.

At this stage Ye's performance certainly does seem statistically extraordinary, but widening the data-set to include lower pedigree events shows her to be less of an outlier.
Adding the results for 400m medley finalists at the US Olympic trials, 2012 European Championships and 2010 Asian Games raises the number of freestyle phases by female swimmers that beat at least one male finalist to 27.

But Ye's performances remain statistically remarkable. Of the 40 freestyle legs swum by male finalists, her 58.68 second split beats almost three quarters (27).

It remains impossible to say with any certainty that Ye's swim was anything other than mightily impressive, but it is clear that her performances are in a different statistical bracket to contemporary world class female medley swimmers.
This deeper analysis of the data confirms my conclusion, based on much less data, that answers to questions of doping simply cannot be found in the data.  As Ross Tucker at Sporting Science explains (emphasis in original):
performance analysis will never constitute proof.  Let me repeat that - performance analysis will never constitute proof.  Put differently, in case you missed it - you cannot analyze someone's performance, be it their age, their improvement, their splits, and infer that they are doping based solely on these observations.  Why?  Because performance is too complex, and we have neither the sensitivity nor the reliability to make a strong conclusion.

So, what performance analysis can do is ask questions.  That's all.  The questions are asked based on what we see, but they are answered based on what we can find.  I'm all for questions - I think that questions drive transparency, and transparency is the key to cleaning up sport. . .

But that's where the answer lies.  The question is found in performances, in history, and while these can easily become generalizations, they are not bad, in and of themselves.  The answers, however, lie in "detective work", and testing.  Stop looking for "smoking guns" - you will not find them.
One of the hallmarks of good governance in sport is a commitment to due process. Ye Shiwen should be treated in accordance to that commitment. She is innocent until proven guilty. The rush to judgment by certain members of media playing judge, jury, executioner is not just unseemly, but counter to the ideals of sport.

0 comments:

Post a Comment