scorecardresearch Skip to main content

Somerville and Seattle scientists use A.I. to design proteins from scratch

Generate Biomedicines and the University of Washington are making advances in a hot field.

Examples of protein structures created from scratch by the Somerville-based startup Generate Bio's artificial intelligence programs.Generate Bio

A Somerville startup and an academic lab in Seattle say they have developed a way to use artificial intelligence to design proteins that don’t exist in nature.

Proteins help people move, digest food, and fight infections — to name a few of their numerous functions. They’re also the basis of a nearly $300 billion drug industry for treating cancer, immune diseases, and other conditions. Most of these therapies are only slightly altered versions of natural proteins. And for some scientists, nature is too limiting.

“Nature has invented a lot of proteins. It’s had three and a half billion years to do that,” said Gevorg Grigoryan, chief technology officer at Generate Biosciences in Somerville. “But it turns out that among the hundreds of millions of protein sequences we know about, there are not actually that many unique ideas.”


Bristling against the constraints of working with what evolution has already made, a growing number of academic researchers and biotech companies, including Generate, are devising methods for creating proteins from scratch. The field, known as de novo protein design, is attracting biologists and programmers alike, including teams at tech giants Alphabet, Meta, and Microsoft.

The latest advance in the competitive and fast-moving field came Thursday when Generate disclosed details about its new method for protein design in a paper posted on its website. It offers a rare glimpse under the hood of the startup, which has tripled its headcount from about 80 to 240 employees over the last year and raised at least $470 million since it was founded in 2018 by Flagship Pioneering, the Cambridge biotech investment firm behind Moderna.

David Baker, a prominent biochemist who leads the Institute for Protein Design at the University of Washington, also shared an unpublished manuscript with the Globe describing a similar method from his group. “We should be able to design new proteins that carry out functions more precisely than anything you can make by retooling what’s in nature,” he said.


Both groups relied on an increasingly popular method in the artificial intelligence field, known as a diffusion model, which programmers have used to make computer apps that can create images from simple prompts — including the eerily good image generator DALL-E. The two teams set out to create a similar system for generating new proteins on computers with properties that a scientist can specify.

The results, described in lengthy papers that have not undergone scientific peer review, reveal a menagerie of mesmerizing shapes — rings, spheres, and snowflakes the likes of which nature has never seen nor made.

Computer models of symmetrical protein structures designed by Generate Bio's newly disclosed artificial intelligence tool.Generate Bio

“Proteins are the machines of biology — the motors, the sensors — that make everything go,” said Michael Nally, chief executive of Generate. “We can now create these machines in whatever shape we need to drive the desired biological task.”

Experts who read the papers for the Globe were impressed with the research, but cautioned that both groups have a long way to go before proving that their methods will lead to new medicines.

“They’re both significant advances,” said Gaetano Montelione, a protein scientist at Rensselaer Polytechnic Institute. But he noted that it’s one thing to design a protein on a computer and another to make it in real life and prove that the molecule looks and behaves as expected. Generate’s paper doesn’t include that experimental data and Baker’s paper only includes it for one small protein.


Both groups concede the importance of such work and say it is underway. “The ultimate test of all of this is experimental validation,” Grigoryan said. “Experimental validation is feverishly ongoing right now, so we’re definitely looking forward to sharing the results of that in the near future.”

Experts suggested that Generate may be feeling pressure to reveal some of its work, but not its full hand, to show potential investors and employees that it’s keeping up with the hottest trends — namely, the diffusion models that are all the rage in deep learning, a subfield of artificial intelligence.

“If you’re doing something with deep learning and you’re not working on diffusion models, you’re left behind at this point,” said Nicholas Polizzi, a protein designer at Dana Farber Cancer Institute. “It’s sort of a Wild West with diffusion models right now, because I think everybody knows this is the next big thing in protein design.”

Nally said that his firm’s newly disclosed protein design tool, dubbed Chroma, “is one component in a much broader computational arsenal,” albeit an important one to share publicly. “We think being an active contributor to advances in the field helps establish Generate as one of the best places to do cutting edge work in these domains.”

Generate Bio chief technology officer Gevorg Grigoryan and head of machine learning John Ingraham led the development of the startup's new tool for designing proteins on computers.Generate Bio

Generate has recruited a small army of biologists, engineers, and programmers to undertake the task, and Nally said the headcount could grow to 400 by the end of 2023. The firm already has a pipeline of 14 preclinical protein therapies, including five with its pharmaceutical partner, Amgen. Generate expects to start another 10 to 15 drug programs of its own next year, Nally said.


The startup’s two most advanced drug candidates are antibodies: one for treating COVID-19 and another for asthma. Clinical trials of the COVID-19 therapy, which was designed to be resistant to all known variants of the coronavirus, could start in the spring. But the company said that neither of its first two drug candidates were made with its new protein design system.

Antibodies are the bread and butter of the protein therapy industry and hardly qualify as proteins that don’t exist in nature. But Generate hopes it can use computers to design better antibodies more quickly than traditional approaches that rely on obtaining the proteins from animals or humans as a starting point.

Baker, who has been a founder or advisor to at least 18 biotech companies, said that his group’s protein design system is not part of any biotech company yet. “But it very likely will be.”

Both group’s studies are expected to soon appear on the preprint server bioRxiv, and will be submitted to peer-reviewed journals later.

One of the protein structures designed by David Baker's group on the computer. When the researchers made and tested the protein in the lab, it bound tightly to a hormone as predicted. Ian C Haydon/UW Institute for Protein Design

Other groups, including researchers at Stanford University and Microsoft, have already posted studies of their own protein design systems powered by diffusion models on the preprint server arXiv, favored by computer scientists. One of the Stanford researchers, Namrata Anand, recently launched a new startup called Diffuse Bio based on the technology.


“There is a new protein diffusion preprint coming out almost every week,” said Sergey Ovchinnikov, a science fellow at Harvard University whose lab studies protein evolution.

Chris Bahl, founder, president, and chief scientific officer of the Boston startup AI Proteins, said it was easy to tell when early AI image generators failed. “They made horrifying nightmares of people with scrambled faces,” he said. Today, the apps are much better, thanks to diffusion models.

Yet until scientists physically make and assess their computer-designed proteins in a lab, it’s impossible to know if they are the real deal or the molecular equivalent of a scrambled face, Bahl said. “I’m very excited to see how it performs when they do experiments with it.”

Ryan Cross can be reached at Follow him on Twitter @RLCscienceboss.