A way to handle data that come in petabytes
Curoverse content system expanding base for genome projects
Anyone collecting data needs a place to put it. Harvard geneticist George Church felt that need acutely in his Personal Genome Project. In the early 2000s, he had the audacious goal of sequencing some 100,000 human genomes, each 25,000 times the size of a traditional electronic record. But though his vision was ripe, the infrastructure to store and manipulate these titanic data sets was not.
Church commissioned Alexander Wait Zaranek, a computer science researcher in his lab, to scope out the tools available. None were suitable, so Zaranek and colleagues Ward Van-dewege and Tom Clegg began building one. Arvados was born.
It’s a content-management system for large genomic data sets. Just as blogging platforms such as WordPress let journalists and writers upload data — text, videos, images — and work with them, Arvados lets researchers and clinicians import genetic data files. They can then run a variety of analyses or share the data.
The first generation of Arvados was activated in 2007 for the genome project. By 2013, its founders had spun off the venture as a free-standing company, Curoverse. In December 2013, Curoverse announced $1.7 million in seed funding to develop its software.
In the 10 years since the Personal Genome Project was conceived, the effort to use genetic data to improve medicine has exploded. Over the next year, researchers are expected to generate 85 petabytes of sequencing data from research subjects and patients. “That translates to about 21 million HD movies,” said Curoverse’s chief executive, Adam Berrey.
Curoverse hopes to be the invisible infrastructure powering such analyses in labs and clinics in the next decade.
So far, the system has been accessible by invitation only. Johns Hopkins University, Harvard Medical School, and the Wellcome Trust Sanger Institute (which is storing 20 petabytes of data) are among the early adopters. Starting Tuesday, any group can sign up to use the system, via a website. Curoverse also sells the system on hardware that can be installed for a fee. It is preparing for a commercial release this summer.