Hiawatha Bray | Tech Lab

The next big thing in data storage is actually microscopic

 Devin Leake, Chief Science Officer, Hyunjun Park, Co-founder and CEO; and Milena Lazova, Scientist work in lab (Catalog)
From left, Catalog’s chief science officer, Devin Leake; CEO Hyunjun Park; and scientist Milena Lazova.

For the founders of a new Boston startup, there’s no such thing as too much information.

Catalog, a company founded by scientists from Harvard University and the Massachusetts Institute of Technology, designs systems that store data on manmade versions of microscopic DNA molecules, instead of on bulky magnetic tapes or silicon chips.

DNA is nature’s own hard drive, storing data by assembling itself in millions of different combinations of just four chemical compounds found in nearly every living cell. Human DNA is so small you need a microscope to see it, but the strand of DNA in a single human cell contains about 800 megabytes of information.


Scientists have been working on replicating the idea behind DNA with artificial versions made in a lab that could store computerized data using the same sequencing techniques found in human genes. The process is laborious and expensive, but the tech giant Microsoft has said it expects to deliver a commercial version by the end of this decade.

Get Today's Headlines in your inbox:
The day's top stories delivered every morning.
Thank you for signing up! Sign up for more newsletters here

But now Catalog, a tiny startup with just $9 million in new funding, threatens to beat Microsoft to the punch, pledging to deliver the first commercial DNA storage product sometime next year. Big-data generators such as corporations and government agencies could use it to store billions of gigabytes of information in a space the size of a bedroom closet.

“It’s a new generation of information storage technology that’s got a million times the information density, compared to flash” storage, said Catalog’s chief executive, Hyunjun Park, referring to the flash memory chips used in digital cameras and thumb drives. “You can shrink down entire data centers into shoeboxes of DNA.”

Though the technology is daunting, the idea behind it seems simple enough.

You take the four main chemical compounds in DNA — cytosine, adenine, thymine and guanine, or C, A, T, and G for short — and use them as shorthand for digital information, just as a computer uses the numbers 1 and 0 as a binary code for all the data it stores.


Scientists can manipulate the order of the compounds so that different sequences of C, A, T, and G correspond to certain sequences of 0’s and 1’s, and huge strings of these compounds can represent the information of a massive data file — every Hollywood film ever made, for instance, or the complete Library of Congress.

The beauty of the technology is that DNA is so dense you can pack a ridiculous amount of data into a microscopic amount. At Catalog, the finished recording resembles a thin, almost invisible film. To access the stored data, the DNA is mixed with water and put into a machine that reads the CATG sequences, translating it back to binary form.

Moreover, DNA is far more durable than magnetic tapes and hard drives, which deteriorate in a few decades. Park said a DNA data archive could remain readable for thousands of years.

Park said Catalog has stored the Douglas Adams science fiction novel “The Hitchhiker’s Guide to the Galaxy” in DNA, to demonstrate that the concept works. Now, all he, Catalog, and the few other entrepreneurs and researchers in the field have to do is prove it can be a viable commercial storage product.

“I think there’s going to be massive challenges, and the biggest challenge is cost,” said Sri Kosuri, assistant professor of chemistry and biochemistry at the University of California Los Angeles.


Kosuri, who worked on DNA data storage at Harvard Medical School, said the process currently is “about six orders of magnitude more expensive than it needs to be.” Making it a practical tool for everyday use “requires some technological breakthrough that I haven’t seen, as yet,” Kosuri said.

Catalog cofounder Park was a postdoctoral associate at MIT, and his cofounder Nathaniel Roquet earned a doctorate in biophysics from Harvard. In 2016, they began developing the company at Indiebio, a San Francisco incubator for biotech startups, and moved to Boston after winning a spot in Harvard’s biotech incubator, the Life Lab. Now they have attracted funding from a host of venture investors, including New Enterprise Associates, OS Fund, Day One Ventures, Data Collective, and Green Bay Ventures.

At least one other startup, Iridia, , based in San Diego, is developing DNA storage systems. And in 2017, Microsoft said that its research department was hard at work on a similar system that it planned to offer as a commercial product by 2020. Microsoft declined to comment.

Hiawatha Bray can be reached at Follow him on Twitter @GlobeTechLab.