- The American Conservative - http://www.theamericanconservative.com -

# Does Pre-Crime Have a Race Problem?

The company Northpointe has a system called COMPAS that tries to predict whether a criminal will reoffend. Based on 137 criteria [1], each offender is given a score from 1 to 10, with higher scores meaning greater risk. Various actors in the justice system use the algorithm to make important decisions about offenders’ fates. The system is proprietary, so its precise inner workings are something of a mystery.

Race is not one of the COMPAS criteria, but in May, using data provided by Broward County, Fla., the journalism outfit ProPublica accused COMPAS of racial bias in a lengthy article [2] accompanied by a complicated methodological appendix [3]. (Logistic regression! Cox proportional hazard models!) The company responded with its own long, dense report [4], to which ProPublica has now replied [5].

Everyone is making this far more convoluted than it needs to be. Whatever problems there are with COMPAS, it’s relatively simple to demonstrate that it’s not racially biased in the most straightforward sense of the term. What’s more, it’s mathematically impossible to devise a rating system that will both (A) pass ProPublica’s purported test for bias and (B) provide scores that in practice mean the same thing for all racial groups. If COMPAS is biased, any system that classifies offenders this way has to be.

The key question is this: If a black person and a white person both receive the same COMPAS score, do they actually have different chances of recidivating? In a word, no. If anything, blacks with a given score often have higher reoffending rates, implying a slight bias against whites.

change_me
[6]

Source: Author’s calculations based on ProPublica’s data [7].

Those who read ProPublica’s report will probably be surprised by this chart, because ProPublica emphasized different numbers that, at first blush, looked quite damning. The mystery is solved with a thorough look at where those numbers come from. Here I’ll focus on the claim that 23.5 percent of whites, but 44.9 percent of blacks, were “labeled higher risk, but didn’t reoffend.” (“Low” scores are 1-4 in ProPublica’s data, while “higher” scores are 5-10.)

These are what’s called “false positive rates.” They are calculated by dividing the number of false positives—those who were classified “higher” but who didn’t actually recidivate—by the total number of people who didn’t recidivate. It answers the question: if you randomly pick someone who ultimately didn’t recidivate, what are the chances that they initially had a COMPAS score of at least 5?

The problem is that in Broward County, for whatever reason, blacks have higher recidivism rates than whites. Given that, any accurate algorithm must classify more of them as higher-risk. This necessarily means that blacks will have a higher false-positive rate. Here’s why.

Recall that the numerator is the number of false positives, i.e., people who were classified as higher risk but didn’t recidivate. Obviously, people classified as low-risk can’t be false positives, because they aren’t positives at all. So whenever you classify more people as higher-risk—even if you’re classifying them correctly—you’re increasing the number of people who will become false positives some percentage of the time. In short, the numerator is going to go up when the recidivism rate goes up.

The denominator, meanwhile, is the total number of people who didn’t recidivate. So when the recidivism rate goes up, the denominator automatically goes down, heightening the effect of the increased numerator.

For a more concrete example, say we have an amazingly precise test. There are only two categories, with exactly 75 percent of the higher category and 40 percent of the lower category reoffending. We give the test to 1,000 whites and 1,000 blacks; 500 whites but 600 blacks turn out to be higher-risk. Even in this mathematically idealized scenario we’ll have a higher numerator for blacks (150 vs. 125 false positives, or 25 percent of those labeled high-risk) and a higher denominator for whites (425 vs. 390, or the total number of people who didn’t reoffend). The false positive rates will be 38 percent for blacks and 29 percent for whites.

Ironically, the only way to avoid the problem is to make the algorithm biased in the obvious sense, so the categories mean different things depending on what race the offender belongs to. In the example above, we could randomly reassign about one-quarter of the high-risk blacks to the low-risk category to bring their false-positive rate down to that of whites. But then a low-risk designation would mean roughly a 50 percent chance of recidivism, instead of 40 percent, when the offender was black.

To be clear, there are some serious objections to systems like COMPAS. Some are uncomfortable with using algorithms to determine criminal-justice outcomes, period, even if it could reduce incarceration by identifying those who don’t pose a threat to society. Others object [8] (quite correctly in my view) to the fact that COMPAS’s details are proprietary, making the results rather hard to dispute in court. Still others might argue against using official recidivism data, saying it comes from a biased justice system.

We should debate these issues. But let’s not demand the impossible.

Robert VerBruggen is managing editor of The American ConservativeFollow @RAVerBruggen [9]

10 Comments To "Does Pre-Crime Have a Race Problem?"

#1 Comment By SteveM On August 2, 2016 @ 10:32 am

Robert VerBruggen is describing a well-documented statistical aberration often found in the social sciences called Simpson’s Paradox:

[10]

That ProPublica demonstrates statistical naïveté is not surprising, since, “There are three kinds of lies: lies, damned lies, and statistics.” And lying via gamed statistics is the engine that keeps the Elites in power.

Even if Simpson’s Paradox is explicitly pointed out to ProPublica as a fatal flaw in their analysis, they would do what good hack propagandists do, simply ignore the compelling counter-argument and pretend it does not even exist.

The clueless Left leaning MSM would happily follow along and not deviate from the bogus conclusions. (Of course movement Conservatives also have no problem using corrupt statistical analysis to advance their positions.)

The last thing the Cronies on both the Left and the Right want is for the Truth to set us free.

#2 Comment By Hankest On August 2, 2016 @ 4:34 pm

SteveM, you claim they would “ignore the compelling counter-argument and pretend it doesn’t exist.” Why not test your hypothesis by going on ProPublica and either commenting as you do here, or write to one of the authors directly?

They responded to Northpointe.
[11]

Maybe you or Mr Verbruggen will receive the same courtesy.

#3 Comment By Brian On August 2, 2016 @ 4:40 pm

I don’t know. Without spending much time reading up on it seems like the core complaint would be that the consequences of the tests inaccuracy fall more heavily on one group or another.

Statistics are nice but the rule of law means everyone, individually, is treated equally.

I am not held responsible for the crimes of other people. Whether or not I have something in common with them that can be easily measured and put in a database is irrelevant, no matter how much you try to obfuscate the consequences.

Lets say we devise a ridiculous test. We take crime stats and divide them up into some broad groups by stuff we can easily measure. Maybe race, family income, single parent, parents education level etc.

Now we take what percentage of each group commit crimes statistically and for our future crime prediction just accuse a corresponding, random number of young people out of each group of being “future criminals”

So if 10% of group A and 0.1% of group B have historically committed serious crimes, we just accuse 20% of young people in group A and 0.01% of young people in group B, chosen at random, of being criminals and lock them up.

If someone says the burden of this ridiculously flawed test disproportionately impacts people in group A, is that right or wrong?

#4 Comment By Brian On August 2, 2016 @ 4:42 pm

Whoops – should say

So if 20% of group A and 0.01% of group B have historically committed serious crimes, we just accuse 20% of young people in group A and 0.01% of young people in group B, chosen at random, of being criminals and lock them up.

#5 Comment By tz On August 2, 2016 @ 10:02 pm

A blind, proprietary algorithm might be able to predict recidivism but does nothing to alter the outcome.

Calvinistic predestination?

Moreover, the inputs will be racially biased. Credit scores, the neighborhoods, families (single mothers v.s. mother and father).

Asking if one can draw an accurate path is not the same as asking if one can find a better one, either proactively or retroactively.

And if this works, what does the model say about refugees, illegal immigrants, and non-european immigrants in general? Trump has been criticized for noting they can’t be vetted. But what if Algorithms can do so, but we don’t like the result?

#6 Comment By collin On August 3, 2016 @ 10:08 am

Of course we should analyze race in the data and statistics. These scores are probably don’t change the opinions much for an experience parole board member or law enforcement. However, it is probably a good quick score, like a credit score, to help authorities understand the prisoners situation in a productive manner.

1) Looking at the graph the impact of race (outside Score 3) appears to be minor and it would be nice to know if it is statistically significant. Also what do other races show?

2) A fair question is why the whites are biased for low scores while African-American are biased in high scores.

#7 Comment By grumpy realist On August 3, 2016 @ 12:15 pm

Based on certain Supreme Court decisions, I think they would look unfavorably on “black box” decisions.

Release the algorithms, guys. Let’s see what’s under the hood. For all that we know, you have a box of mice trained to throw dice as your calculation device.

#8 Comment By John Smith On August 4, 2016 @ 7:08 pm

There is only one race, the human race.

The black members of our country, 12.5% of the pop., commit over 50% of all crime, especially violent crime.

Please stop committing crimes and then we’ll discuss what this author would like to discuss.

#9 Comment By Loneoak On August 17, 2016 @ 3:07 pm

“The key question is this: If a black person and a white person both receive the same COMPAS score, do they actually have different chances of recidivating? In a word, no. If anything, blacks with a given score often have higher reoffending rates, implying a slight bias against whites.”

You misunderstand recidivism here, and that makes all the difference. Recidivism is a measure of being arrested again. Whether recidivism is an race-neutral proxy of reoffending depends on whether arrest rates and other police-civilian interactions are race-neutral. I think we know the answer to that.

#10 Comment By frumpy realist On November 20, 2016 @ 8:33 pm

Robert, your analysis has been confirmed in a journal the US government publishes:
Flores et.al, “False Positives, False Negatives, and False Analyses: A Rejoinder to “Machine Bias: There’s Software Used Across the Country to Predict Future Criminals. And it’s Biased Against Blacks.””, Federal Probation Journal, September 2016