Long missing from the biotech and high-tech map of the region, Holyoke is finally finding an advantage in its location on the western end of the Massachusetts Turnpike: It’s much faster to reach than some of the most connected places on the Internet.
Beginning this summer, life-sciences companies in the Boston area will be able to send troves of data to a new state-affiliated computing facility in Holyoke in a fraction of the time it would take to ship it to a commercial data center. Located at the new Massachusetts Green High Performance Computing Center, the life-sciences facility could lead to breakthrough drugs and other products by making it easier, faster, and even cheaper for companies to investigate leads involving large amounts of data.
That should encourage smaller operations “to try things that might fail more,” said Paul Brown, chief architect of the start-up Paradigm4, which developed a key database used in one of the nation’s largest genome projects. “Firms are going to be prepared to do things with much higher potential payoff but lower chance of success.”
The $4.5 million life-sciences computing cluster was funded by a state grant and will be installed at the Holyoke center this summer. The initiative aims to give life-sciences efforts better access to cloud computing — or the use of multiple powerful computers simultaneously to crunch data too complex for a single computer. These very large datasets, including things like genome sequences and other life-science data, are known as Big Data.
Cloud computing is available through commercial vendors such as Google Inc. and Amazon.com Inc, but using them can be both costly and too slow. Just getting Big Data to computing clusters powerful enough to analyze it can be an elaborate ordeal that can take days. The size of such files can easily overwhelm the computer networks and Internet connections of most small companies, and there are bottlenecks and traffic jams along the way that further slow down delivery.
Indeed, Internet transmission can be so slow that it is quicker for companies to ship or drive their data to a computing center.
“The fastest way to get a large puddle of data from New York to LA is called the sneakernet,” Brown said. “You get a graduate student, you buy him a bus ticket, and you send it that way.”
‘It’s like having a new Highway 90, but for data. It’s a new virtual highway from all these universities to this facility in Holyoke.’
But the life-sciences cluster in Holyoke has access to dedicated 10-gigabit-per-second fiber- optic lines that can make shipping a huge amount of data from Boston a breeze. One terabyte store of data — about 1 million average-size photos — can take a mere 15 or 20 minutes to travel from one of the universities in Greater Boston participating in the project to Holyoke; using the typical connections available to small businesses, a similar shipment could take days or weeks.
“It’s like having a new Highway 90, but for data,” said Christopher Hill of Massachusetts Institute of Technology, the principal investigator for the project. “It’s a new virtual highway from all these universities to this facility in Holyoke.”
The one catch is that the super-fast link is between the Holyoke facility and the five schools involved in the project: MIT, the University of Massachusetts, Harvard University, Northeastern University, and Boston University. Companies that want to send data to Holyoke would still have to get it to the schools first for high-speed transit.
Another feature of the project is that it should be cheaper for companies to use the Holyoke computers over a commercial cloud center, because the life-sciences cluster is funded by the state and its maintenance costs are underwritten by the five-university consortium.
“The data is local. You don’t have to move it up into some storage cloud, buy the storage cloud from a provider, like Google or Amazon, then compute it. This is in our backyard. It’s easy to consume, and it’s cost-effective,” said chief health care strategist David Dimond of EMC Corp., the Hopkinton-based computer storage giant.
EMC is one of a number of heavyweight business partners backing the venture; others include IBM Corp., AstraZeneca PLC, Pfizer Inc., Merck & Co., and Merrimack Pharmaceuticals Inc. Officials from those companies are helping in the design of the center and will probably collaborate with experts from the universities to use the center’s resources. Through such collaborations, industry researchers will be able to “peek inside” the computers as they churn through problems, said Prashant Shenoy of the University of Massachusetts, an investigator on the project.
“Because we own the machines, we can get a much deeper idea of how researchers are using the machines than if they used the commercial cloud,” he said.
One enterprise that expects to use the Holyoke computer cluster is hack/reduce, a Cambridge nonprofit that helps companies with Big Data challenges and hosts hack-a-thons at its Kendall Square offices. Hack/reduce expects to use the Holyoke cluster to run frequent public computing challenges on life-sciences projects.
Its founder, Chris Lynch, said those sessions will help the Holyoke center build a bridge to the entrepreneurial community in Boston and Cambridge.
“It’s really providing a platform for engagement,” Lynch said.
The end goal of the life-sciences cluster, said the director of the Holyoke center, John Goodhue, is to promote collaborative research between companies and academics in genomics and related fields, with an eventual aim of developing diagnoses and treatments that are specific or unique to each patient or illness. For example, being able to distinguish the genetic fingerprint of one cancer tumor from another may allow researchers to develop singular treatments for each patient.
The Holyoke cluster is also expected to help small and big companies alike tackle problems not well suited to the commercial cloud, said John Reynders, the head of AstraZeneca’s informatics research and development.
Some problems can easily be split into parts that run on separate computers. Other challenges can’t be split easily, because the sets of computers performing different tasks need to constantly interact with one another as they puzzle through the data. That’s something the Holyoke facility can do, but a commercial cloud operator cannot, Reynders said.
For example, scientists might be searching for a common biomarker among Alzheimer’s patients who have responded well to a single medication, which requires sophisticated analysis across multiple data sets.
“You have to bring together imaging data, genetic data, clinical data, proteomic data,” Reynders said. “That’s the kind of puzzle we see the platform . . . being able to help us crack.”
AstraZeneca is among the big companies that expect to take advantage of their proximity to the Holyoke facility for experiments with Big Data.
“Certainly having an environment like Holyoke —