scorecardresearch Skip to main content

How to get better COVID-19 infection data without universal testing

There is a scientifically sound, far less costly approach at our disposal: conducting COVID-19 tests on a representative sample of the population.

Lesley Becker/Lesley Becker/Globe Staff; Adobe

The state of Massachusetts could benefit from more granular data on COVID-19 infections and deaths, but far more problematic is the fact that epidemiologists are basing their COVID-19 projections on imperfect data regarding the number and severity of infections. We know how many people have tested positive, but given our limited supply and conduct of tests, that is probably a far cry from the true number of people who have been infected thus far in the state and across the nation.

Projections of acute care needs and deaths are bouncing around like a pinball from day to day and from one epidemiological model to the next. Responsible policy makers necessarily act on the basis of plausible worst-case scenarios to protect public health, but what counts as plausible remains uncertain.


Calls have been widely sounded for frequent testing of the entire US population. Such an initiative is not yet possible, due to insufficient supplies, infrastructure, and human resources to conduct and interpret tests.

But universal testing is not necessary. There is a scientifically sound, far less costly approach at our disposal: conducting COVID-19 tests on a representative sample of the population.

Massachusetts has a population of nearly 7 million, but a random sample of 5,000 residents is large enough to determine the prevalence of COVID-19 infection within a margin of error of 1.5 percentage points. The information yielded by testing such a sample would contribute decisively to our epidemiological projections and related policies to prevent and control the spread of this too-often deadly virus.

The sampling approach poses two potential challenges: generating a random sample of the population and managing the logistics of testing the sample.

Generating a viable random sample from scratch can be a significant undertaking, but in this case, the US Census Bureau has already done the work for us. The bureau’s American Community Survey, which includes data on more than 3 million Americans, is collected yearly by phone, personal visit, and the Internet and is designed to generate vital information that generalizes to the entire American population.


The 2019 American Community Survey includes over 75,000 Massachusetts residents who have already provided substantial information about their race, ethnicity, age, sex, living arrangements, education, work status, and income. Questions on those items would not need to be repeated. Instead, supplementary information could be gathered on COVID-19 symptoms, known infection, and current patterns of social distancing behavior.

We have developed a supplemental questionnaire and pilot-tested it extensively. It takes less than seven minutes to complete.

Once a supplementary American Community Survey questionnaire is in place, the next step is to deploy health workers to visit a random sample of respondents to conduct COVID-19 tests on consenting adult participants. Testing should be for current infections as well as for the antibodies that indicate past infection and probable immunity.

This approach would place a minimal burden on the Census Bureau. Following a presidential directive, the bureau would merely need to contact the selected individuals to seek their consent to participate in this vitally important data-gathering activity.

When linked together, the existing American Community Survey data, supplementary data, and COVID-19 test results will inform our real-time understanding of the population-level prevalence of COVID-19 infection and symptoms and their relationship with patterns of work and social distancing for different demographic groups.


More accurate data will show whether actual numbers of infections are two times, 10 times, or perhaps even 20 times the number of positive tests. The correct multiplier is critical to policies and behaviors aimed at preventing and controlling infections and will help us develop suitable approaches to cushion their social and economic impacts. A high multiplier would, in fact, convey some good news, since it would imply that a larger number of people were infected but either asymptomatic or experiencing relatively mild symptoms, indicating we are somewhat closer to the safe harbor of herd immunity.

Better data will allow public health experts and government officials to base their estimates and projections on much stronger evidence than they have today, to better customize current policies to realities on the ground, and to more accurately inform future epidemic responses.

There is no need to continue bemoaning the absence of meaningful data on the spread of COVID-19. A dramatic improvement in the nature, quantity, and reliability of data is within reach and could be gathered in time to make a real difference.

David E. Bloom is professor of economics and demography and David Canning is professor of population sciences at Harvard T.H. Chan School of Public Health.