fb-pixel
IDEAS

Trumpian Internet data detectives outsmart themselves

A case of numbers being exactly what they seem.

Business expenses, credit card transactions, and other data that can fall over wide ranges tend to comport with Benford's Law. Data that falls within narrower bounds, like human heights — or precinct-level vote totals — do not.
Business expenses, credit card transactions, and other data that can fall over wide ranges tend to comport with Benford's Law. Data that falls within narrower bounds, like human heights — or precinct-level vote totals — do not.ASDF - stock.adobe.com

As the Trump campaign offered up dubious anecdotes of election-rigging, Trump-friendly Internet data detectives raised an alarm of their own: allegations of voter fraud based purely on mathematics. One such analysis relied on Benford’s Law, a tool of forensic accounting ordinarily used to identify when records have been fabricated. By scrutinizing precinct-level vote totals in Illinois, Wisconsin, and Pennsylvania, the amateur data sleuths cited Benford’s Law to report that Biden’s votes seemed fishy whereas Trump’s looked genuine.

Alas, while the idea of employing a method for exposing white-collar financial crimes to catch the Biden campaign supposedly cooking the books has obvious appeal to Trump supporters, the argument is nonsense.

Advertisement



Benford’s Law concerns the first digits of a group of numbers; with the numbers 329, 490, and 1,232, we would focus only on the 3, 4, and 1. Counterintuitively, the law predicts that in many circumstances — business expenses, population sizes, lengths of rivers, incidences of disease — the leading digits 1 through 9 will not occur equally often. Instead, 1 will be the leading digit most frequently, around 30 percent of the time, with 2 the next most frequent (18 percent), and so on. That this distribution is so unexpected is what makes Benford’s Law so valuable. If you were trying to fake some financial figures to look “random” without knowing this fact, a mismatch with Benford’s Law could be a dead giveaway.

Why, exactly, the first digits show up with these same frequencies in different settings is a bit mysterious, but a common feature heralding the arrival of the Benford distribution is exponential growth. Imagine tracking a population of bacteria that doubles every day, so on the first day there was one cell, then two, four, eight, 16, and so on. Any of these numbers starting with a 5, 6, 7, 8, or 9 would be followed immediately by a number starting with 1 as a result of the doubling; for example, 512 doubles to 1,024 and 8,192 doubles to 16,384. So it must be that in the overall list there are about as many leading-digit 1s as there are 5s through 9s combined.

Advertisement



Trump supporters claimed to have found that Biden’s vote totals in various places had a suspiciously low number of 1s and 2s as the leading digits. But precinct-level vote totals don’t have the properties that would trigger Benford’s Law. Instead, the numbers are constrained between certain bounds. As mathematicians and political scientists have pointed out, most of the precincts in Chicago, for example, reported between 300 and 700 total votes, of which the Biden-Harris ticket won approximately 82 percent. Their vote counts across precincts therefore contained a perfectly reasonable abundance of leading-digit 3s, 4s, and 5s, and not many 1s. Meanwhile, the 17 percent vote share won by Trump and Pence in Chicago generated more 1s, simply because their numbers were typically less than 200 votes won per precinct. In other words, Benford’s Law doesn’t apply at all here.

By tabulating the distribution of leading digits in order to lob allegations of fraud, Trump supporters succeeded only at finding creative new ways to remind us how badly he lost.

Aubrey Clayton is a mathematician living in Boston and the author of the forthcoming book “Bernoulli’s Fallacy.” Follow him on Twitter @aubreyclayton.

Advertisement