Late last year, the Justice Department joined the growing list of agencies to discover that algorithms don’t heed good intentions. An algorithm known as PATTERN placed tens of thousands of federal prisoners into risk categories that could make them eligible for early release. The rest is sadly predictable: Like so many other computerized gatekeepers making life-altering decisions — presentencing decisions, resume screening, even healthcare needs — PATTERN seems to be unfair, in this case to Black, Asian and Latino inmates.
A common explanation for these misfires is that humans, not equations, are the root of the problem. Algorithms mimic the data they are given. If that data reflect humanity’s sexism, racism and oppressive tendencies, those biases will be baked into the algorithm’s predictions.
But there is more to it. Even if all the shortcomings of humanity were stripped away, equity would still be an elusive goal for algorithms for reasons that have more to do with mathematical impossibilities than backward ideologies. In recent years, a growing field of research in algorithmic equity has revealed fundamental — and insurmountable — limits to equity. The research has deep implications for any decision maker, human or machine.
::
Imagine two physicians. Dr. A graduated from a prestigious medical school, is up on all the latest research and carefully tailors her approach to each patient’s needs. Dr. B takes one cursory glance at every patient, says “you’re fine” and mails them a bill.
If you had to pick a doctor, the decision might seem obvious. But Dr. B has one redeeming attribute. In a sense, she is more fair: Everyone is treated the same.
Illustration of an evolution of a protozoa to an ape to a caveman to a human with 1s and 0s written on them.
This trade-off isn’t just hypothetical. In an influential 2017 paper titled “Algorithmic Decision Making and the Cost of Fairness,” the researchers argue that algorithms can attain higher accuracy if they aren’t also required to perform equitably. The heart of their case is simple to grasp. Generally, everything is more difficult when constraints are added. The best cake in the world is probably more delicious than the best vegan cake in the world. The most accurate algorithm is probably more accurate than the most accurate equitable algorithm.
In the design of an algorithm, therefore, a choice must be made. It may not be as stark as the choice between Dr. A and Dr. B, but it’s of the same flavor. Are we willing to sacrifice quality for the sake of equality? Do we want a system that’s more fair or higher-performing? Understanding how best to toe this line between performance and fairness is an active area of academic research.
This tension also crops up in human decisions. Universities might be able to admit classes with higher academic credentials if they didn’t also value diverse student bodies. Equity is prioritized over performance. On the other hand, police departments often concentrate patrols in high-crime areas at the expense of over-policing communities of color. Performance is prioritized over equity.
Deciding whether equity or performance should be prioritized is not simple. But what the study of algorithms lays bare is that it is an unavoidable decision with real trade-offs. And these trade-offs often breed contention.
::
But what do “fairness” and “equity” actually mean? Algorithms require precision, but language can be ambiguous, creating another obstacle. Before you can be fair, you need to define what fair is. Although there are many ways to define fairness, they are pitted against one another in a rigid mathematical competition where not everyone can win.
To gauge whether an algorithm is biased, scientists can’t peer into its soul and understand its intentions. Some algorithms are more transparent than others, but many used today (particularly machine-learning algorithms) are essentially black boxes that ingest data and spit out predictions according to mysterious, complex rules.
Imagine a data scientist trying to understand whether a new algorithm for cancer screening is biased against Black patients. The new technology dispenses a binary prediction: positive or negative for cancer. Armed with three pieces of information about each patient — their race, the positive or negative prediction of the algorithm, and whether the patient truly has cancer — how can the data scientist determine if the algorithm is behaving equitably?
One reasonable way to probe the question is to see if error rates are different for Black patients versus white patients. Errors are costly in either direction. For instance, failing to diagnose cancer (getting a false negative) among Black patients at a higher rate than white ones might be considered unacceptably discriminatory. A differing rate of false positives — which takes healthy patients down a pointless and costly rabbit hole — is also problematic. If an algorithm has equal rates of false positives and negatives for Black and white patients, it is said to have attained equalized odds. That is one form of fairness.
Another way to measure fairness is to check whether the algorithm’s predictions have the same meaning for Black and white patients. For example, if a negative prediction corresponds to a 90% chance that a white patient is cancer-free but only a 50% chance that a Black patient is, then the algorithm might reasonably be considered discriminatory. Conversely, an algorithm whose predictions carry the same cancer implications regardless of race might be considered fair. Fairness of this flavor is called calibration.
Here’s the problem: Researchers have shown that no algorithm can attain both types of fairness. Meeting one fairness goal necessitates violating the other. It is as hopeless as pinning down both sides of a seesaw.
These various quality measures are intricately entwined. For example, the algorithm could be tweaked to raise the bar for a diagnosis of cancer. There would be fewer false alarms, but patients receiving a negative result could no longer rest so easily. Combine these competing effects with the fact that predispositions for certain cancers may, in fact, differ between racial groups and an intractable puzzle emerges.
These types of findings, called impossibility theorems, abound in algorithmic equity research. Although there are dozens of reasonable ways to define equity — equalized odds and calibration being just two — it’s unlikely that any handful of them can be met simultaneously. All algorithms are unfair according to some definition of fairness. Bias hunters therefore are guaranteed success. Seek and ye shall find.
These impossibilities don’t hold just for algorithms. The incompatibility of fairness definitions exists whether predictions are made by a sophisticated cancer-screening algorithm or a dermatologist using the human eye to examine skin tags. But the simple structure of algorithms — data in, decision out — has helped make it possible to study equity. It’s easy to ask for too much from algorithms, but that shouldn’t keep us from asking for anything at all. We have to be intentional and specific about the type of equity to pursue.
Even if attaining equity is fundamentally difficult, seeking it is not futile. Forcing an optimized algorithm to behave equitably might nudge it off the pinnacle of performance — but that may be a preferable trade-off. And for systems that are far from optimal and far from equitable, an alternative might exist that is better in both regards.
We are accustomed to considering social forces that undermine fairness: violent history, implicit bias and systematic oppression. Were we to shovel away these human contributors to inequity, we would still eventually hit impenetrable bedrock. These are some of the fundamental limits that algorithms are thudding against today. In seeking to improve equity, algorithms teach us that we can’t have it all.