Relative Age Effect – UK Athletics

Introduction

In my previous post I examined the relative age effect in European football. The relative age effect describes how where your birthday falls in the year might affect your life chances.  People who grow up as the oldest in their age group are over-represented in sporting teams and have been observed to perform better academically.

In this post I apply the same analysis to UK track and field athletics. As a club athlete myself I was interested to see if the same effect would be apparent in an individual sport. Most studies of the relative age effect have been on team sports, and explanations for it often cite how youth team selection policy will favour the most physically developed older children. I didn’t think it obvious that this would be an issue for athletics.

My data set for football was the ranks of professional players. For athletics I took the comprehensive ranking lists of performances from the superb Power of Ten site.

All-time performances

The graph below shows the distribution of birthdays for athletes ranked in the top 100 of all time from England or Wales, in any event.

all_time_ew

In England and Wales children are divided into school classes in line with the academic year, so a child with a September birthday will be among the oldest.  Athletics age groups are grouped in the same way.

A clear trend can be seen above, with months at the start of the academic year being over-represented compared to those at the end. This is a clear illustration of a relative age effect. Athletic performance is entirely objective, unlike football where selection for a team is based on someone’s subjective judgement. Yet even in this most meritocratic of sports the bias is unmistakable.

In the previous article I defined a few different measures for the skew in the distribution of birthdays, here they are again:

  • Offset is the gap, in days, between the average birthday and the middle of the year.  This is the most obvious measure of how skewed a distribution is and how it compares to others.  However it probably isn’t intuitively clear how distorted the distribution is so I’ve also given a couple of other metrics.
  • First half percentage is the fraction of birthdays in the first part of the year, so a value greater than 50% indicates a bias towards the start of the term.
  • First to fourth quarter ratio is the relative frequency of those born in the first quarter of the year compared to the last, so a value of 2 would indicate twice as many.

Using these measures, below is a breakdown of the England and Wales all-time ranking data by event type.

Count Offset First half percentage First to fourth quarter ratio
Throws 677 -25.34 58.05 1.97
Sprints 694 -21.50 57.20 1.74
Jumps 677 -20.77 59.38 1.72
Distance 692 -12.75 56.79 1.28

So the distribution across events isn’t even, the skew is far less pronounced in distance running that in the other events.  For the purposes of this analysis, 800 metres and up is considered distance.  For comparison, the offset I found in European football leagues was roughly between 20 and 30 days, so about in line with the non-distance events.

The most pronounced relative age effect with in the throwing events, here’s a graph of the distribution.  The contrast between August and September birthdays is pretty stark.

all_time_ew_throws

Here’s the same for distance running.

all_time_ew_distance

As you can see, it’s not particularly obvious to the eye that there’s a skew towards the start of the academic year at all.

Recent seasons

To check that this effect is currently in force, rather than just historical, I repeated the analysis for the last two seasons.  Here I included athletes achieving a top 200 ranking for the season in an event, far deeper than the all-time performances for the previous test.  As the graph below shows, the effect remains very visible.

ew-2015-2016-top200

As before, the skew for distance events is far less pronounced than for the other disciplines.

Count Offset First half percentage First to fourth quarter ratio
Throws 1142 -26.51 61.56 1.83
Sprints 1375 -26.30 60.36 1.88
Jumps 1284 -24.67 60.44 1.85
Distance 1185 -12.68 54.68 1.49

The distribution of birthdays is slightly more skewed for men than women.

Count Offset First half percentage First to fourth quarter ratio
Men 2395 -24.40 59.50 1.84
Women 2137 -20.46 58.73 1.67

So in summary, athletics in England and Wales shows a clear relative age effect.  In the senior age groups there is a bias towards those born at the start of the academic year, but this bias is diminished in the distance running events compared to the others.

I’ll now look at two datasets that shed a bit of light as to what produces this bias, the junior rankings and Scotland.

Junior rankings

I performed the same analysis on the junior age groups in England and Wales, examining the distribution of birthdays for athletes ranked in the top 200 for an event in the last two years.

Rankings of junior athletes differ from those from the senior ranks in that you would expect those born earlier in the term to have an advantage. UK athletics age categories run as the school term does, from September. So a child born in September will always be in the same category as a child born the following August, and will always have the advantage of almost a full year of additional physical development. On the other hand, once you reach the senior ranks there is no reason why your birthday should help. So when looking at the junior data we should bear in mind that a bias is expected, but the relative amount of that bias is still revealing.

The grid below shows the offset of the average birthday broken down by age group and type of event. As a reminder, this offset is how far the average birthday is ahead of the middle of the year. So in the case of U15 sprints, with an offset of around 50 days, that means that instead of the average birthday falling around the end of February, as we would expect with a term starting in September, it actually falls around a month and half earlier in mid-January.

Sprints Distance Jumps Throws All
U13 -70.3 -44.0 -62.1 -60.6 -56.9
U15 -50.1 -29.0 -40.1 -42.8 -39.4
U17 -39.7 -18.8 -43.4 -37.3 -33.2
U20 -20.5 -4.6 -17.7 -17.8 -15.3

The grid illustrates two things.

First, as you might expect, the advantage of being born earlier in the term diminishes as you progress through the age groups, simply because the relative advantage of a year’s physical development is less when you are 19 compared to 14.

Second, the same diminished bias is evident in distance events in the junior categories as it is for seniors.  It’s almost vanished for the under 20 category, for these two years the bias is actually less than for the seniors.

Here’s a plot of the most extreme bias given above, the U13 sprints.

ew-u13-boy-sprints-2015-16-top200

There is an extraordinary bias towards September birthdays.  Interestingly the effect seems to be very heavily concentrated in the first few months of the academic year, it tails off a lot after that.

As mentioned, a bias in the junior ranks would be expected.  But we can see how the sometimes dramatic advantage enjoyed by those born at the start of the term could encourage them to continue in the sport at the expense of their contemporaries born later in the term.

I can only speculate as to why there is a smaller bias for distance running.  A year of extra physical development will help you here as it would for other events.  However it may be that, in relative terms, it is less of a help and that individual ability or willingness to train comes to dominate more quickly than it does for other events.

Whatever the reason, there is a reduced bias in the junior events for distance running, which is reflected at the senior level.  There’s no in-built advantage to a birthday at the start of the term once you’re a senior.  The data suggests that it is the relative advantage enjoyed as a junior that shapes the profile of those who achieve some success as a senior.

The explanation for the relative age effect tends to be that the older juniors in an age category get selected for representative teams and thus benefit from greater opportunities to improve.  This could apply in athletics, but I suspect, as a largely individual sport, it is more a case of self-selection.  It’s more fun to win a race than to come last, whether that’s because you’re the oldest or not, and so the older children in a cohort are more likely to enjoy it, feel they have talent and get further involved.

Scotland

My previous post on the effect in football considered the case of Scotland where the age categories differ to England and Wales. In athletics the age groups for competition are the same throughout the UK, so the Scottish rankings are also based on age groups starting in September. However, the academic year at school is arranged so that the oldest children in the year are those born in March.

Here’s the distribution for Scottish senior athletes ranked in the top 200 for an event in the past two years.  Note that the ranking data I’m using only goes so deep as there is a performance level cut-off, so by dint of being a smaller country there isn’t as much data available.

s-2015-16-top200

It’s not clear that there is any relative age effect at all.  Although it’s not very pronounced you can just about see that there are peaks in September/October and March/April.  This might imply that there is an effect favouring those born at the start of the athletics age group, and another favouring the oldest in the school year.

Below is a similar grid to the one that was given previously for England and Wales, showing how the relative age bias changes as the age groups progress.

Sprints Distance Jumps Throws All
U13 -47.5 -44.7 -41.5 -68.7 -40.1
U15 -25.6 -19.6 -22.8 -38.9 -24.2
U17 -10.7 -7.5 -17.2 -22.1 -12.4
U20 -0.6 1.3 14.1 -1.8 2.2

Just as in England and Wales there is a pronounced bias in the  youngest age group.  However the bias fall much quicker, and has gone by the under 20 category.

The plot below of the under 17 category shows how the distribution has evened out by this point.

s-u17-2015-16

For comparison here’s the same graph for England and Wales, showing a very clear and smooth bias to the distribution.

ew-u17-2015-16

The data is a bit thin for Scotland so we have to be a little careful in drawing conclusions.  I can only speculate but it does appear that there may be a counterbalancing effect between the two age groupings.  On the one hand those born in the autumn have the advantage of competing against younger opposition in UK athletics events.  On the other hand those born in spring will be older than their contemporaries at school.  They may take encouragement if their greater physical development lends them an advantage and stick with the sport when they may not have otherwise.

Conclusion

Athletics in the UK experiences a relative age effect.  Furthermore we can make the following observations:

  • There is a historic effect that applies to the highest level of the sport affecting all-time top performances.
  • The effect is visible right now in performances over the last couple of seasons, applying to a much deeper level of performance.
  • In Scotland, where the academic  year doesn’t match the competition age group year, the effect is absent at senior level.
  • There is less of an effect on distance running compared to other disciplines, from junior through to senior level.

Of course throughout we’re only dealing in aggregate results and there will be individuals who buck the trend.  Having a birthday in the last week of August didn’t stop Denise Lewis becoming Olympic champion.

I started this investigation because my daughter Beatrix was born this July, so I was actually hoping not to find evidence of the relative age effect.  However, if she was hoping to become a champion sprinter or shot-putter her choice of parents will prove more of a handicap rather than her birthday.  My wife and I are both distance runners.  If Beatrix’s interests lie in the same direction, then the distribution below should bring some comfort.  It’s for the all-time top 100 ranked women in distance events, and actually shows an inverse age effect with more birthdays in August than any other month.  So maybe I’ve found what I wanted after all.

ew-womens-distance

Sources

All data comes from the power of 10.

The Python code can be found here on Github.

bea_in_trainers

Posted in Data science, Python | Leave a comment