fb-pixel Skip to main content

Predictions — whether algorithmic or human — may not be fair

Algorithms recommend movies and determine whose taxes are audited. And in courtrooms across the country, they assess the risk a defendant poses to public safety.


On Tuesday, while Americans nationwide elect the country’s next president, Californians will also decide where to steer their state’s criminal justice system. Voting on Proposition 25, they’ll choose whether to replace the use of bail money with a cashless system that relies on algorithms to determine who is released and who is detained while their cases unfold.

The choice raises challenging questions about criminal justice reform, but also prompts deep philosophical queries on the role of predictive algorithms in American society more broadly.

Algorithms permeate nearly every aspect of modern life. They recommend movies and news stories. They determine whose taxes are audited and which restaurants are inspected. And in courtrooms across the country, they assess the risk a defendant poses to public safety.


California’s peculiar form of direct democracy means that ordinary citizens are regularly charged with making nuanced and consequential policy choices. Proponents of Prop. 25 argue that cash bail unjustly punishes the poor and that algorithmic determinations of risk will lead to fairer outcomes. Opponents argue that the proposed change replaces one flawed system with another equally problematic one. The debate crosses party lines and has created strange bedfellows, with civil rights groups decrying the use of algorithms and siding with the bail industry to preserve the state’s current system.

In this debate and similar ones on the role of predictive algorithms, proponents and opponents often narrowly focus on the algorithms themselves. One side typically argues that algorithms are more accurate than humans, while the other often claims that algorithms encode the biases of their human designers and the data on which they are built.

But some of the most important — and often overlooked — issues in these discussions have little to do with algorithms.

A threshold question is whether decisions should be based on predictions at all, regardless of whether they are made by humans or algorithms. In many policy contexts, the very idea of basing important decisions on predictions can raise serious ethical concerns.


In the criminal justice system, judges have long restricted the freedom of defendants based on informal mental calculations of risk to public safety. That means people are jailed for acts they are predicted to commit in the future, not just offenses that they have committed in the past — a practice in tension with the basic principle that wrongdoing is necessary to justify punishment. In a 1987 split decision, the Supreme Court held that pretrial “detention” is not “punishment,” but Justice Thurgood Marshall, in dissent, called that distinction “merely an exercise in obfuscation.” When objections to prediction are sufficiently profound, no predictive process, whether algorithmic or human, may be appropriate.

If one does decide that predictions are warranted, a second question is how predictions should be used. Suppose, for example, that a defendant is predicted to have a 1 in 3 chance of being convicted of a new crime. Based on that prediction, a judge might order the individual detained. Alternatively, that same prediction could be used to prioritize the defendant for supportive services ranging from mental health counseling to substance use interventions. Policy makers, not algorithms, must determine how best to use predictions.

At the Stanford Computational Policy Lab, we’re partnering with a public defender’s office to use predictive algorithms to support their clients. In contrast to popular depictions of defendants skipping town, most miss court appointments for mundane and understandable reasons, like lacking access to a car or public transportation. Missed court appearances can result in harsh penalties, including arrest and incarceration. We’re using machine learning algorithms to identify those at risk of missing their appointments so we can offer them free door-to-door transportation to court and back. Accurate predictions are essential to efficiently allocate limited transportation funds to those who need it most.


Finally, a third question — and the one most closely tied to the algorithms themselves — is how predictions should be made. While this question has many facets, a central theme centers on what factors should be used to ensure predictions are fair.

In a recent project, we asked a representative cross section of Americans whether they would prefer pretrial risk assessment algorithms that were based in part on one’s race and gender, or algorithms blind to that information. Ninety percent of the respondents expressed a preference for race blind algorithms, almost uniformly stating that predictions based on protected traits were unfair; 65 percent stated a preference for gender-blind algorithms.

A preference for blind algorithms may often be reasonable, but in some instances predictions that treat protected groups alike can end up harming the very groups they were intended to protect. For example, gender-blind algorithms have been found to overstate the risk that women pose to public safety, and, as a result, may lead to the detention of women at low risk for reoffending while simultaneously releasing men at higher risk of reoffending.


When we told our study participants that blind algorithms could potentially worsen such disparities, support for incorporating race and gender into algorithmic predictions increased substantially — a 34 percentage point increase in support of including race and an 8 percentage point increase in favor of including gender. These findings suggest that the criteria for designing “fair” algorithms are still in flux and depend on the specific contexts in which they are used. At the same time, broad generalizations on algorithmic fairness seem inadequate.

The debate over predictive algorithms will only intensify as new possibilities emerge in the coming years. As Californians decide the future of their criminal justice system, it is tempting to focus on the algorithms themselves. But regardless of whether predictions are generated by algorithms or humans, we shouldn’t lose sight of how these predictions are used — or even whether we should be making predictions at all.

Sharad Goel is an assistant professor at Stanford University and director of the Stanford Computational Policy Lab; Julian Nyarko is an assistant professor of law at Stanford University; and Roseanna Sommers is an assistant professor of law at the University of Michigan.