When is a £50bn fiscal black hole not a £50bn fiscal black hole? More often than you might think, according to a new book about government data that will leave readers with renewed cynicism about every statistic that comes out of a politician’s mouth.
Its author, Georgina Sturge, is a statistician at the House of Commons library. She has not set out here to expose attempts by rogue actors to bamboozle us deliberately with figures – although there are plenty of them. Instead, even more alarmingly, Bad Data explains how the ways in which we count, measure and record things are very often just not fit for purpose. How, for example, could Keir Starmer claim, in 2020, that there were 600,000 more children living in poverty than at the end of the last Labour government but Boris Johnson say there were 100,000 fewer? “These claims were both supported by the government’s official figures,” Sturge says.
As a member of the parliamentary research team, tasked by MPs with finding evidence to inform their policies or reinforce their prejudices, Sturge knows what we don’t know about the population. “We don’t know how many children are being homeschooled because they’re not counted.” We don’t know how many foreign nationals have overstayed their student visas. We didn’t have accurate unemployment figures throughout the 1980s, nor crime figures in the 1990s and 2000s. “And because we don’t count some things or don’t count them consistently, we can’t say whether they’re getting better or worse.” What happened, for instance, after Tony Blair pledged to end child poverty in 1999? “You can take your pick of answers: either ‘We don’t know’ or ‘It depends’.”
There are many reasons data can be unreliable even while it may be strictly accurate. With poverty, that’s often because of disagreements about its definition (which is partly what gives rise to the difference between Starmer’s and Johnson’s counts). With crime or NHS statistics, we see anomalies when the same people are responsible for reducing numbers and recording them: one police force showed a 27% decrease in “theft from a motor vehicle”, which did have a target, but a 407% increase for “vehicle interference”, which didn’t.
Sometimes a data set is too small, which leads to the not-very-meaningful information that there are 9,000 foreign nationals living in Lincoln, give or take 6,000; or the timeframe too short, showing that hate crime (a relatively new crime in UK law) “has doubled in five years”; or too long, suggesting that the UK is experiencing the “largest wave of immigration for nearly 1,000 years”. Often, reasonable assumptions are made, such as that most people coming to live and work in the UK fly into Heathrow, Gatwick or Manchester, which is where the International Passenger Survey is taken. When eight new countries joined the EU in 2004, nobody anticipated that Wizz Air would open cheap routes to Luton, Stansted, Birmingham and Sheffield Doncaster. Sturge estimates that hundreds of thousands of people were not counted when they arrived.
While all of this is ridiculous to the point of seeming laughable, Sturge is very effective at explaining, with human examples, how bad data affects lives. Readers of Hannah Fry’s Hello World or Caroline Criado Perez’s Invisible Women will be familiar with the notion that biased humans create biased artificial intelligence programmes. Here, we see their direct effects. Until 2020, there was a feedback loop in the algorithm the Home Office used to refuse visas to people at risk of overstaying: “One of the factors determining whether a country was on the ‘suspect’ list was how often visa applications from that country were refused. It kept refusing Nigerians on the basis that theirs was a ‘suspect’ nationality, but every time a Nigerian was refused … it fed the system’s existing suspicions.”
Sturge is admirably pragmatic about the difficulties of keeping tabs on populations. “It’s a messy business, counting people,” she says. She seems to fantasise about the “huge benefits of having a complete population register” as a first step towards collecting and joining up data, but is also clear about the potential dangers:
“In 2018 and 2019, a policy requiring voters to show ID at polling stations was piloted in selected areas in English local elections. Afterwards, when the government was asked for its response to the theory that people from ethnic minorities would find it harder to produce the necessary ID and hence be put off voting, it claimed the pilots showed ‘no impact on any particular demographic group’. But this was not strictly true. The pilots had not collected data on people’s ethnicity, so it was not possible to even test this theory.”
While some people are naturally fearful of handing over private information to the government, it is generally the lack of documentation that disadvantages people. In the 1950s and 60s, the government gave new arrivals from Commonwealth countries no documents to prove their status, setting a “trap” for a generation of immigrants and leading to the the Windrush scandal. Sturge regards this as “a lesson in the importance of keeping … good data”.
For a House of Commons employee, Sturge is surprisingly and often refreshingly political. Referring to evidence that the prison population would have to increase by 15% to cause a reduction in crime of 1%, she says bluntly: “Prison clearly doesn’t ‘work’ on this metric.” She’s blistering about academics who allow bad data to slip through the peer review process, seeing academia sometimes as “a kind of conspiracy of people all scratching each others’ backs for a career boost”. Referring to the Wizz Air debacle, she recalls: “We couldn’t seem to count people coming in and this, to many people, seemed like a sign that immigration itself was out of control … Taking back control was ultimately what the EU Referendum was all about.” But she is not party political nor obviously partisan. “The claim that we were sending the EU £350m per week was demonstrably false … [but the] Remain campaign too produced some sketchy numerical claims.”
The book is so full of examples, anecdotes and numbers, skipping around from grammar schools to policing to climate refugees and back again, that it tends to be overwhelming. And there is a frustrating lack of solutions. Clearly, we’re in an enormous muddle, but what should we do to get out of it? Spending a lot of money would be one way. Sturge is envious of the abundance of data available in football betting, and says: “Where there is the will – and big money – the quality of data we can get is really quite astonishing … and yet we don’t know how many people are eligible to vote, how many people died from Covid-19, or whether crime is going up or down.” But throwing premiership-level money at the problem doesn’t seem a priority for a country currently facing a 7% fall in living standards.
On the other hand, anyone who reads this book will subsequently go through life seeing phrases such as “a 7% fall in living standards” as absurd and in need of robust challenge, and Sturge should regard that as a victory – because the public and the media have our parts to play, too. We are right to demand honest, accurate and well-evidenced information from our politicians, but unfair to expect them to have all of the answers, all of the time. That only encourages the use of bad data. “It’s not about us all becoming minor experts,” she says. “It’s about us being curious and demanding explanations.”
It’s hard to criticise the book for being light on answers, then, when it is so good at inspiring curiosity and the inclination to challenge. As all good politicians know, the first step towards rectifying a problem is identifying, defining and quantifying it. And who better to do that than a House of Commons statistician with an axe to grind about making data better for everyone?
• Bad Data by Georgina Sturge is published by Little Brown (£20). To support the Guardian and Observer, order your copy at guardianbookshop.com. Delivery charges may apply.