Accelerated Genomics Data Processing using Memory-Driven Computing

Next generation sequencing (NGS) is the driving force behind precision medicine and is revolutionizing most, if not all, areas of the life sciences. Particularly when targeting the major common diseases, an exponential growth of NGS data is foreseen for the next decades. This enormous increase of NGS data and the need to process the data quickly for real-world applications requires to rethink our current compute infrastructures. Here we provide evidence that Memory-Driven Computing (MDC), a novel memory-centric hardware architecture, is an attractive alternative to current processor-centric compute infrastructures. To illustrate how MDC can change NGS data handling, we used RNA-seq pseudoalignment followed by quantification as a first example. Even more impressive, pseudoalignment by near-optimal probabilistic RNA-seq quantification (kallisto) was accelerated by more than two orders of magnitude with identical accuracy and indicated 66% reduced energy consumption. One billion RNA-seq reads were processed in just 92 seconds. Clearly, MDC simultaneously reduces data processing time and energy consumption. Together with the MDC-inherent solutions for local data privacy, a new compute model can be projected pushing large scale NGS data processing and primary data analytics closer to the edge by directly combining high-end sequencers with local MDC, thereby also reducing movement of large raw data to central cloud storage. We further envision that other data-rich areas will similarly benefit from this new memory-centric compute architecture.

[1]  Jan O. Korbel,et al.  Computing patient data in the cloud: practical and legal considerations for genetics and genomics research in Europe and internationally , 2017, Genome Medicine.

[2]  A. Chinnaiyan,et al.  Precision oncology in the age of integrative genomics , 2018, Nature Biotechnology.

[3]  Maria Fazio,et al.  Are Next-Generation Sequencing Tools Ready for the Cloud? , 2017, Trends in biotechnology.

[4]  Mona Singh,et al.  Computational solutions for omics data , 2013, Nature Reviews Genetics.

[5]  Stephen R. Piccolo,et al.  A cloud-based workflow to quantify transcript-expression levels in public cancer compendia , 2016, Scientific Reports.

[6]  B. Langmead,et al.  Cloud computing for genomic data analysis and collaboration , 2018, Nature Reviews Genetics.

[7]  E. Lander Initial impact of the sequencing of the human genome , 2011, Nature.

[8]  M. Schatz,et al.  Big Data: Astronomical or Genomical? , 2015, PLoS biology.

[9]  Heidi L. Rehm,et al.  Building the foundation for genomics in precision medicine , 2015, Nature.

[10]  Lior Pachter,et al.  Near-optimal probabilistic RNA-seq quantification , 2016, Nature Biotechnology.

[11]  Eric S. Lander,et al.  Cutting the Gordian helix--regulating genomic testing in the era of precision medicine. , 2015, The New England journal of medicine.

[12]  Kimberly Keeton,et al.  The OpenFAM API: A Programming Model for Disaggregated Persistent Memory , 2018, OpenSHMEM.

[13]  Rachel G Liao,et al.  A federated ecosystem for sharing genomic, clinical data , 2016, Science.

[14]  Sharad Singhal,et al.  Adapting to Thrive in a New Economy of Memory Abundance , 2015, Computer.

[15]  Ninghui Li,et al.  Federation in genomics pipelines: techniques and challenges , 2019, Briefings Bioinform..

[16]  John von Neumann,et al.  First draft of a report on the EDVAC , 1993, IEEE Annals of the History of Computing.

[17]  P. Gonzalez-Alegre,et al.  Towards precision medicine , 2017 .