I came across Russell Garland’s story, “Social Media, Genomics Driving Data Tsunami” in WSJ’s Venture Capital Dispatch over the weekend as I was preparing for HIMSS 11, and I began to think about how many life sciences organizations are already on the edge of a “big data tsunami.”
R&D efforts are now focused on personalized medicine, where studies leveraging genome-level information results in petabytes of data that must be organized and interpreted into results usable for researchers to advance “targeted therapies.” And this is just assuming researchers are consolidating information based on their own organization’s data related to a specific drug in development versus where even greater volumes of data are beginning to emerge with collaborative disease networks. These networks are a combination of clinical research communities, research-based institutions, investigator networks, CROs, pharma, providers, patients, labs and payers – all trying to find answers around specific diseases. With all this information converging, the data volumes multiply rapidly resulting in the need for not only a common way of viewing the data, but a way to manage the volume.
As the industry is evolving so is the technology supporting it. Researchers are not only using new tools – such as semantic search engines and advanced analytics with visualization techniques – to assist in analyzing the data, but the infrastructure to support it has also evolved with the increased availability of affordable cloud services. Just one example of how data is being generated in mass in genomics research is where advances in nanoscale and microfluidic chemistry now allow DNA to be monitored on tiny beads by photographic sensors that generate TIFF images in collections of up to 800 gigabytes.
The genomics work at Sanger Institute and European Bioinformatics Institute (EBI) alone led to an installation of close to 10,000 cores of compute capacity and approximately 10 petabytes of raw storage capacity. To manage these storage and processing costs, organizations such as Pfizer’s Biotherapeutics & Bioinnovation Center (BBC) began using Amazon cloud services to develop and refine models, and J&J Pharma R&D is hosting analytical solutions in the cloud, specifically NONMEM applications.
The good news is that this generation of such large volumes of data is driving innovation at technology companies to help advance the R&D engines for organizations that are striving to make personalized medicine a reality. Personalized healthcare will drive massive disease networks to facilitate molecular-level investigation of diseases, target optimal treatment for patients, and facilitate off-label use of personalized therapies supported by a whole ecosystem of trusted intermediaries and service providers.
Artificial genetics intends to apply personalized medicine to large patient populations by connecting large datasets of genomic information with clinical patient data. Leveraging its vast search capabilities, Google is leading some of this innovation by making large, strategic, computing infrastructure investments in Adimab, a drug discovery startup, to develop “in-silico” drug models that pharmaceutical companies will use for accelerated research of targeted therapies. Simulated representations of patients, or “e-Clinical avatars,” are already being used to do “cool analysis” of clinical and genetic data. These simulations are improving drug response rates in patients while speeding their time to market at the same time.
As the technology evolves to make sense of it all, so too does our ability to embrace the massive opportunity that this “big data tsunami” presents.