fb-pixel

Data breach at genealogy site has privacy experts worried

The GEDmatch breach shows what can go wrong when stored genetic information isn’t adequately safeguarded.
The GEDmatch breach shows what can go wrong when stored genetic information isn’t adequately safeguarded.Cayce Clifford/Bloomberg

The peculiar matches began early on a Sunday morning. Across the world, genealogists found that they had numerous new relatives on GEDmatch, a website known for its role in helping crack the Golden State Killer case.

New relatives are typically cause for celebration among genealogists. But upon close inspection, experienced users noticed that some of the new relatives seemed to be the DNA equivalent of a Twitter bot or a Match.com scammer; the DNA did things that actual people’s DNA should not be able to do.

Others seemed to be suspected murderers and rapists, uploaded by genealogists working with law enforcement. Users knew that police sometimes used the site to try to identify DNA found at crime scenes. But users found the new profiles strange because they also knew that profiles made for law enforcement purposes were supposed to be hidden to prevent tipping off or upsetting a suspect’s relatives amid an investigation. What really drew attention, however, was the fact that all 1 million or so users who had opted not to help law enforcement had been forced to opt in.

GEDmatch, a long-standing family history site containing around 1.4 million people’s genetic information, had experienced a data breach. The peculiar matches were not new uploads but rather the result of two back-to-back hacks, which overrode existing user settings, according to Brett Williams, chief executive of Verogen, a forensic company that has owned GEDmatch since December.

Advertisement



Although the growth of genealogy sites has slowed slightly in recent years, their use by police has increased. After authorities in California used GEDmatch in 2018 to identify a suspect in the decadeslong Golden State Killer case, police departments across the country began to dig through their cold case files in the hopes that this new technique could solve old crimes.

And GEDmatch was often their preferred site. Unlike the genealogy services Ancestry and 23andMe, which are marketed to people who are new to using DNA to learn about themselves, GEDmatch caters to more advanced researchers. The site appeals to police because it allows DNA that has been processed elsewhere to be uploaded. Verogen has a long history of working with law enforcement, and the acquisition of GEDmatch further solidified this collaboration.

Advertisement



Scientists and genealogists said the GEDmatch breach — which exposed more than 1 million additional profiles to law enforcement officials — offers an important window into what can go wrong when those responsible for storing genetic information fail to take necessary precautions.

In an interview, Williams said that the first breach occurred early on July 19. After shutting down the site, his team “covered up the vulnerability,” he said, and brought it back online, but only briefly. “On Monday we took the site down again because it was clear the hackers were trying again,” he said.

This time the site remained down for nearly a week. “We’re taking an abundance of caution because we don’t want to end up in the same situation again,” Williams said.

Williams said he had hired an outside security team and contacted the FBI to see if the agency would investigate. The FBI did not respond to a request for comment.

All was far from resolved when the site’s settings were restored, said Debbie Kennett, a genealogist in England, who wrote about the breach on her blog. We’re stuck with our DNA for life, she said. “Once it’s out there, it’s not like an e-mail address you can change,” she said in an interview. Because of its interconnected nature, she added, when any one person’s genetic information is exposed, the exposed DNA can potentially affect their family members too.

Advertisement



In a paper published last year, Michael Edge, a professor of biological sciences at the University of Southern California, and fellow researchers warned several genealogy websites that they were vulnerable to data breaches.

“Of course, hacks happen to lots of companies, even entities that take security very seriously,” he said. “At the same time, GEDmatch’s, and eventually Verogen’s, response to our paper didn’t inspire much confidence that they were taking it seriously.” Other genealogy websites, he added, seemed more open to the researchers’ recommendations for improving security.

For many, the presence of fake users in GEDmatch was as alarming as the breach itself. Genealogists know that they cannot trust names or e-mails. They also know that a user can easily upload someone else’s genetic profile. But the breach exposed that behind the scenes, hidden by privacy settings, were all kinds of profiles of people who were not even real.

The giveaway that the matches were not actual relatives was that their DNA was too good to be true, said Leah Larkin, a biologist who runs DNA Geek, a genealogical research company.

People who managed profiles for many clients and relatives repeatedly found that these fake users somehow were displayed as close relatives across the unrelated profiles. Their visible ancestry information reinforced the matches were impossible and suggested the fake profiles had been designed to trick the site’s search algorithm for some reason.

Advertisement



In Edge’s paper, he warned that it was possible to create fake profiles to identify people with genetic variants associated with Alzheimer’s and other diseases.

“If something is just a geeky genealogist messing around, there is no concern,” Larkin said.

But it becomes a problem, she said, if users are trying to find people who all share a particular genetic mutation or trait, as Edge cautioned. Such information could be abused by insurance companies, pharmaceutical companies or others, she said.

The breach also reinforced something that genealogists have been saying for years: Mixing genealogy and law enforcement is messy, even when you try to draw clear lines. Until two years ago, the primary DNA databases that law enforcement used for investigations were maintained by the FBI and police.

That changed with the Golden State Killer case in 2018.

For some users, the reason for keeping their profiles private is philosophical. Even if helping law enforcement could mean helping catch a killer, they do not want their genetic information used to incriminate their relatives.