England won the Second Test against South Africa comfortably enough, but there was a frustrating spell before tea on the first day as Kagiso Rabada and Anrich Nortje added 35 for the ninth wicket. Having bowled relatively full earlier in the day, England switched to a short-pitched attack to no great effect. Notably it was a full-pitched ball from Ollie Robinson after tea that delivered the breakthrough as Nortje was lbw.
So why had England changed approach? Perhaps they had been swayed by the Test against India at Lord’s when they had successfully bounced out the tail, or perhaps it was a reaction to the nature of this season’s Dukes cricket balls which have been losing menace more quickly than usual, demanding something different from the bowler. But there was also, seemingly, data that the South Africa tail was susceptible to short-pitched bowling. The problem is that if every ball is short-pitched, batters come to expect it and can set for it; far more dangerous is the surprise short-pitched ball.
As the CricViz analyst Ben Jones put it: “You can’t just look at dismissals” – the Jimmy Anderson inswinger is all the more dangerous for following a series of outswingers. CricViz’s expected wickets model shows that good balls tend to take wickets regardless, but Jones acknowledges that context matters and sees that was one of the areas in which the use of data in sport has to improve.
Or take the yorker, which nobody doubts is the most effective ball in one-day cricket. The problem is that there is a tiny margin for error: too full and it’s a low full-toss, too short and it’s a half-volley, both very hittable. A batter anticipating a yorker can advance or retreat to change the length.
As Tim Wigmore and Freddie Wilde point out in Cricket 2.0, it was that, allied to the suspicion Ben Stokes would try to make him hit to the longer leg-side boundary, that allowed Carlos Brathwaite to hit those four successive sixes to win the 2016 T20 World Cup final. The Chris Jordan over to Jimmy Neesham that went for 23 at the 2021 tournament, likewise, was the result of the yorker being predicted.
Similar problems have dogged data analysis in football almost from the start. Charles Hughes, the technical director of the FA whose 1990 book The Winning Formula confirmed direct football as official doctrine, drew his conclusions from the evidence of 109 matches involving “successful sides” – Liverpool, England Under-16s and Under-21s, and World Cup or European Championship matches involving Argentina, Brazil, England, the Netherlands, Italy and West Germany – between 1966 and 1986. He focused almost entirely on the 202 goals scored in those games – just as cricket analysis tends to focus on dismissals – and 87% came from moves of five passes or fewer. Therefore, he concluded, teams should try to limit moves to five passes or fewer.
Even leaving aside the startlingly low sample size and the selective nature of the data, there is an absence of nuance. Might it not be that what works for England Under-16s in a friendly in the mud and cold of a British winter is not necessarily appropriate for Brazil amid the heat and altitude of a World Cup in Mexico?
Hughes even noted that Brazil were the side most likely to score after a long string of passes, 32% of their goals coming from moves of six passes or more, with West Germany next on 25%. Given they had won six of the 13 World Cups to have been played, the obvious conclusion would seem to be that possession football is good for you, but Hughes did not pursue it.
Nor did he, or Charles Reep, the amateur statistician whose ideas Hughes developed, consider that direct balls may be more effective if they are used sparingly. Just as a batter can set themselves for persistent short-pitched bowling, or ready themselves for a string of yorkers, so a defence can drop deep and prepare for an aerial bombardment.
Just as the danger of the occasional bouncer may be enhanced by the surprise factor, by a batter trying to get forward having to adjust, so the threat of a long ball may be greater if a defence has been drawn out by a team holding possession. (And because almost nothing in sport is absolute, there are occasions when a batter is so spooked by the short-pitched bowling or a defence so rattled by a string of long balls, when the most effective tactic is the stifling pressure of a sustained barrage.)
Hughes and Reep were, to use the politest possible term, pioneers and have about as much to do with modern data analysis as Pliny the Elder does with modern medicine. But the issue of context is one with which statistics continues to struggle.
A coach at a Premier League side told me a story of his manager being convinced by their data department to operate a high line against a team with a notably quick forward, despite a first-choice centre-back having to be replaced by a veteran who was just returning from injury and hadn’t been quick on the turn even in his pomp.
They conceded three within 30 minutes and lost 3-0, but the analysts justified their advice by pointing out their team had won the xG. But that was because, as the coach angrily replied, having scored with three early chances, the other team had no need to attack. They sat back, conserved energy and weren’t much bothered if they conceded a couple of half-chances: the game was over with an hour remaining. That’s not to say that xG is not a very useful tool – it is – merely that it doesn’t always give the whole picture.
CricViz’s Jones is clear that data analysis is not enough; it makes sense only when used alongside video analysis by those who understand the limits of what statistics can tell you. There are few absolute rights and few absolute wrongs and the meaning of everything is in part determined by its relationship to everything else. Context is vital; players are human. Sport is not an algorithm.