My interview in SC Magazine on the importance that IT vendors explain the computing limitations and the actual sophistication of the Machine learning algorithms in security products.
My interview on BBC Radio Foyle about Dixons Carphone which said a huge data breach that took place last year involved 10 million customers, up from its original estimate of 1.2 million.
My interview on BBC Radio Foyle where we discuss the technology implemented by Dimension Data for ASO to enable the tracking of riders in real-time at the Tour de France.
The richness of data which lies beneath the saddle of a Tour de France Rider
I was fortunate to be invited to Le Tour de France by Dimension Data who are the Official Technology Partner of the organisers Amaury Sport Organisation (ASO) to see how they provide real-time tracking of riders. Dimension Data also are title partner of Team Dimension Data for Qhubeka which is Africa’s first ever UCI World Tour team joining cycling’s top ranks in 2016 and has been building to become one of the top teams in world cycling.
Dimension Data are a serious player in digital infrastructure, hybrid cloud and cybersecurity. They have a turnover of USD 7.5 billion, offices in 52 countries, and 30,000 employees. Dimension Data are now in the fourth year of a five year contract to be the official technology partner of Amaury Sport Organisation (ASO).
What interested me most on this trip was how Dimension Data were able to provide the digital infrastructure to track each rider in real-time for over 3,500 kilometres. I confess that I did not think this was a difficult problem until some of the real-world problems were explained to me. In a nutshell, France is a large country and not everywhere has cellular coverage especially the mountainous and country routes that riders take. In addition, you have thousands of people along the route all trying to connect to the cellular networks that you might expect le Tour to use to send data onwards. Now, I completely understand why it took 112 years to have live tracking on le Tour. Here I hope to give you an insight into how Dimension Data overcame and delivered a world-class solution.
A Brief history of the Tour coverage
Running from Saturday July 7th to Sunday July 29th 2018, the 105th Tour de France will be made up of 21 stages and will cover a total distance of 3,351 kilometres. The 2018 Tour de France will include a total of 26 mountain climbs or hills and altitude finishes. Organisation of the Tour is done by French media group Amaury Sport Organisation (A.S.O.) who also run a variety of sporting events around the globe. Amaury Sport Organisation is a subsidiary of the Amaury Group, media and sport group that owns the newspapers L’Equipe. Amaury Sport Organisation is a company that owns, designs and organises top international sporting events including of course le tour de France.
This race was first organised in 1903 to boost sales for the newspaper L’Auto, which published regular updates on the different stages of the race. the Tour de France’s popularity has grown for over a century. It continues year on year to attract huge media attention. It is the world’s largest cycling race by far, and is the third largest sports event after the FIFA World Cup and the Olympics. Of course, it was different in the beginning where it was only covered by journalists from the newspaper organisers L’Auto. In fact, they did not want rivals covering the race. It was another 18 years before they allowed other reporters to cover it. Cinema newsreels would provide footage a day or two after the event. In 1929, the first live radio broadcast was by the newspaper L’Intransigeant for Radio Cité using plain old telegraphy lines. In 1932 they broadcast the sound of riders crossing the col d’Aubisque in the Pyrenees. Television pictures came at that time but were delayed by a day as film had to be taken via plane or train to Paris and edited. The first live broadcast was for the finish at the Parc des Princes in Paris in 1948 & the first from the side of the road was in 1958. It was only a year later when helicopters were first used for TV coverage. Now of course, live TV coverage takes place from start to finish.
Amaury Sport Organisation (ASO) of course want to embrace technology such as other sports like Formula one do so audiences can become even more engaged. We forget that prior to 2015, tracking rider was done manually with stopwatches and timing boards radioing back to HQ about which particular rider has passed a particular milestone. Grouping riders was also done manually by officials hanging off the back of a motorcycle or sticking heads out of vehicles checking the race numbers as well as counting the number in the group. Quite often a rider would go ‘missing’ from official statistics for long periods. It was a lottery for each team.
ASO undertook an analysis of the Tour de France in 2014 and discovered an ageing audience, and little major innovation in TV coverage for many years. Strategies and race tactics were difficult to understand for viewers and the feedback showed that the audience wanted more engagement. For instance, in Forumla One, fans can use digital platforms on multiple devices to view live leader boards, in-car cameras, interactive forums, and circuit guides. ASO wanted to replicate this and give fans and commentators real-time information about each riders’ performance as they were only able to track the peloton and breakaway groups. Tracking an individual rider was not possible prior to Dimension Data’s solution.
Dimension Data, a global IT services company headquartered in South Africa were seen as the ideal global service integrator. Ultimately, the final platform delivered in 2015 included real-time big data analytics, elastic cloud infrastructure, mesh networks, contemporary digital platforms, cutting edge security, custom hardware, advanced collaboration technologies, and agile software. They leverage tools like Confluence to enable eﬀective design collaboration, Jira for software development planning, and Octopus Deploy to facilitate clean, repeatable, and automated installation and roll out of new code and builds into the platform.
Tracking individual riders is not easy however. The course is long and diverse and travels through rural France for much of the time. Timing for other events can be quite easily solved with traditional watchmakers but an event like the Tour requires numerous IT vendors, components and technical complexity. Any solution must be sufficiently robust to handle a spike in data feeds such as when crashes occur. There is a famous tweet of the crash in 2015 by @letourdata.
This data-rich tweet from @leTourData generated 31,801 engagements and doubled the account’s number of followers. Social media engagement has exploded since they created the live-tracking site. It turns out that second-screen viewing of the Tour has increased by more than 50% in 3 years. The Tour’s has over 7 million followers on social media. 75% are in the 18 to 35 age group. Their social media fans increased by 141% from 2014 to 2017 and there was a a 1,000% growth in traffic to live-tracking website (Race Center) from 2014 to 2017.
The center piece to the live tracking of riders is a weatherproof, low power, light GPS tracker fitted to each bike (see below). It had to be designed and manufactured. Interestingly, radio signals are blocked by both the carbon fiber in the seat and the cyclist, so the transmitter had to be outfitted with a small antenna that pointed over the rear tire. It also had to be designed to allow quick swaps for both charging needs and equipment changes and you will see on TV, cyclists having to change bikes in a race and they now instinctively grab the transponder and attach it to the new bike.
In fact on this years brutal Arras to Roubaix stage 9 of the Tour which featured 21.7km worth of cobbles over 15 sections, Romain Bardet suffered three punctures thus requiring three bike changes & subsequent swapping of tracker. The tracker has become smarter, smaller and lighter with each new version. It now contains GPS sensors & various inertial measurement capabilities including 3D accelerometers, gyroscopes and a magnetometer.
These trackers communicate with each other in a mesh network and to gateways in television motorbikes and official cars. Basically, a mesh network is where communicating devices (i.e. GPS trackers) can connect directly & dynamically to nearby nodes (i.e. motorbikes & cars) to efficiently route data back to central HQ. It is a data relay system. Good line of sight is crucial. The latest tracker has a transmission range of one kilometer over a 3G wide wireless area network using the 3GPP 802.15.4 standard. This is the primary transmission mesh network between telemetry devices and relay points. In turn, this establishes a moving mesh network with the ability to use the other telemetry devices as reference points to enhance the accuracy of the location coordinates and the ability to use the best relay points in the vicinity. The secondary transmission network streams data to a helicopter and then is relayed to an end-of-race receiver in the technical zone near the finish line of each stage. Planes are also used. These aircraft send the data with long-range transmitters to the Command Hub at the finish line. Dimension Data analyse over 135 million data points on each stage of the 2018 race (which incidentally is less than the previous race which was 150 million data points per stage due to it being longer). Edge processing is also used. It is actual ideal for this use case. Edge processing is a method of optimizing systems by taking the control of data away from some central nodes to the “edge” which makes contact with the physical world which in this case is the on the road. Travelling through tunnels for instance also requires virtual updates to fill in any gaps in the data. This is a visual representation of the technical solution.
In the technical zone, data is transferred from the broadcaster’s truck to Dimension Data’s ‘Big Data’ truck. They uses an infrastructure as a service powered by 60+ virtual servers spread across 3 continents delivering more than 350 million CPU cycles per second – all in 100% uptime. This data is then provided as a real-time data stream to the television broadcasters for use in live TV graphics and on a live-tracking website that allows commentators, media, and fans to track individual riders. Information provided includes the exact location of riders, their speed and the distance between them. Other data provided includes bib numbers, rider names, team names, and rider information such as current classification, environmental data such as maps, terrain details, gradient and altitude changes, and localized weather. Media data includes biographical information on the cyclists as well as high-resolution photos and videos. The truck is a data center on wheels housing the infrastructure and a collection of applications and tools from multiple vendors all tied together with custom software created by Dimension Data developers in Australia & South Africa. They run a 24 hour development cycle which allows new code to be entered into the solution each day.
Some of the environments in which the telemetry devices have to work include heavy RF traﬀic. For example, the technical zone at the end of each stage has hundreds of Wi-Fi networks, thousands of mobile devices, and more than 50 TV broadcasters. The zone is, in short, the origin of the live TV feed for the race’s global audience. This creates a high RF-noise environment right at the finish line of every stage. The design principle was to shield the electronics from this static clutter by enclosing the core microcontroller unit within its own Faraday cage. I can vouch for all (see below) the cables which trailed along each walkway in the technical zone.
That big data truck communicates directly with Dimension Data’s global cloud platform. The data once ingested & processed is encrypted and sent over networks that are not Internet-connected. Security elements include best practice cloud and web security as part of an IT-as-a-service offering, regular security assessments, real-time threat management and on-premise network security in the Big Data truck. Tools to protect against malicious attacks and malware include NGAV, next-generation anti-virus, Intrusion Prevention/Detection Systems and advanced malware protection. This Big Data truck is the ‘nerve centre’ of the solution and is located in the technical zone at the finish of each stage of the race during the three weeks of the Tour.
This is also where the broadcasters are located too. Incidentally, I was delighted to see my favourite ITV tour de France highlights show been recorded live with Gary Imlach and Chris Boardman in this zone.
A much fuller technological overview of how Dimension Data capture the data and stream it live is outlined in their white paper. Dimension Data ultimately now feed new live data to the A.S.O.’s broadcast partners for 3.5 billion cumulative television viewers. It was also achieved without large capex investments in technology as the solution is cloud-based and can be ‘put into hibernation’ until the next cycling race. As an Engineer told me, if the tour took the same route each year, you could possibly consider installing communications infrastructure but of course the tour changes each year. That will never change. It tries to spread the joy to the small towns of France, and that is a nice thing to do.
The Icing on the cake – making predictions
Data Dimension can use machine learning and AI techniques to predict race developments and stage outcomes. Advanced analytics and coded algorithms incorporate historical rider data with actual race conditions to make data-driven race predictions such as the estimated time of arrival at key points in the race, and whether the peloton is likely to catch any breakaway groups. Building on the analytics platform and algorithms, they used a combination of telemetry data, race results, rider data, course information, and conditions data to predict race outcomes. They can also predict how hard are the riders working at any moment and what the diﬀerence is in eﬀort in diﬀerent groups. They can also do performance profiles to determine the attributes of diﬀerent riders, and what sort of races or stages they are most suited for. From this, they can also estimate who is likely to do well on a given stage based on their profile, results, and the nature of the day’s route. The richer the dataset available of course, the more factors the model can consider. After creating machine learning models based on data from the tour, they also used five years of race results from an external source & 3rd party data services incorporating factors such as the weather to enrich the data with additional features that provided greater context to the raw data. The example below shows the data that is applied to enable the machine learning algorithm to make the catch prediction.
Overnight batch predictions are also run, taking into account the previous stages data. This enables Dimension Data to predict the stage favourites and apply the riders’ performance profile to allow them to model the next day’s stage. They use the following technology stack to deliver these outcomes:
This application of machine learning allows the race organizer to create new revenue streams based on a growing interest in data from the race. They also provide a purpose-built commentators application which gives TV commentators at the Tour de France a direct access to live race data for in-depth event coverage such as rider positions, data insights and a live news feed in a visually accessible format.
It truly was an eye opening experience both personally and technically. I have been teaching computer networks at Ulster University for 20 years. I also teach cyber security so I had a particular interest in understanding how they created the technical solution. Dimension Data were very open in that respect with every employee I met following up with rich information white papers and links in the aftermath of my short wonderful visit. I am an actual mega fan of le tour de France. I never miss the UK highlights show which I watch with my son Jack who is a cyclist. Of course I was utterly gutted that our favourite rider Mark Cavendish MBE had to drop out a few days earlier but meeting the other Dimension Data riders more than made up for that. I also have to say that I was extremely fortunate to get Peter Sagan – arguably the most popular rider left in the tour to pose for a selfie with me before the start of the stage which he actually won. I was there to also see that and cheer him on.
Dimension Data have leveraged their success with the Tour de France into its broader digitization practices and, together with Cisco, on another project to provide protection for African rhinos. With the Connected Conservation model, the technology is designed to proactively protect the land against humans. The animals are not touched, and are left to roam freely while a ‘layered’ effect of sophisticated technology, people and gadgets protect them. They have also created a new Sports Practice that enables other events and venues to track and supplement performance information for their fans by generating insightful information on athletes and teams, as well as revolutionary customer engagements.
In the future, we may see more data being shared such as each riders power output data, cycling & pedalling cadence, respiration and heart rate. This information is sensitive so teams are reluctant at present to embrace but I suspect ASO will persuade them of the engaging nature of such information to the viewing public. One thing we do know, is that viewers will become more immersed in the worlds biggest race – roll on helmet mounted real-time Virtual Reality feeds!