ReneGENE-GI: Empowering Precision Genomics with FPGAs on HPCs

Genome Informatics (GI) serves to be a holistic and inter-disciplinary approach in understanding genomic big data from a computational perspective. In another decade, the omics data production rate is expected to be approaching one zettabase per year, at very low cost. There is dire need to bridge the gap between the capabilities of Next Generation Sequencing (NGS) technology in churning out omics big data and our computational capabilities in omics data management, processing, analytics and interpretation. The High Performance Computing platforms seem to be the choice for bio-computing, offering high degrees of parallelism and scalability, while accelerating the multi-stage GI computational pipeline. Amidst such high computing power, it is the choice of algorithms and implementations in the entirety of the GI pipeline that decides the precision of bio-computing in revealing biologically relevant information. Through this paper, we present ReneGENE-GI, an innovatively engineered GI pipeline. We also present the performance analysis of ReneGENE-GI’s Comparative Genomics Module (CGM), prototyped on a reconfigurable bio-computing accelerator platform. Alignment time for this prototype is about one-tenth the time taken by the single GPU OpenCL implementation of ReneGENE-GI’s CGM, which itself is 2.62x faster than CUSHAW2-GPU (the GPU CUDA implementation of CUSHAW). With the single-GPU implementation demonstrating a speed up of 150+ x over standard heuristic aligners in the market like BFAST, the reconfigurable accelerator version of ReneGENE-GI’s CGM is several orders faster than the competitors, offering precision over heuristics.

[1]  Ümit V. Çatalyürek,et al.  Benchmarking short sequence mapping tools , 2013, BMC Bioinformatics.

[2]  Heng Li,et al.  A survey of sequence alignment algorithms for next-generation sequencing , 2010, Briefings Bioinform..

[3]  Bertil Schmidt,et al.  Performance analysis of computational biology applications on hierarchical Grid systems , 2004, IEEE International Symposium on Cluster Computing and the Grid, 2004. CCGrid 2004..

[4]  David A. Bader High-Performance Algorithm Engineering for Large-Scale Graph Problems and Computational Biology , 2005, WEA.

[5]  Liang-Bo Wang,et al.  Common applications of next-generation sequencing technologies in genomic research , 2013 .

[6]  Karen S. Frese,et al.  Next-Generation Sequencing: From Understanding Biology to Personalized Medicine , 2013, Biology.

[7]  S. Nelson,et al.  BFAST: An Alignment Tool for Large Scale Genome Resequencing , 2009, PloS one.

[8]  S. Salzberg,et al.  Repetitive DNA and next-generation sequencing: computational challenges and solutions , 2011, Nature Reviews Genetics.

[9]  Gonzalo Navarro,et al.  A guided tour to approximate string matching , 2001, CSUR.

[10]  Yongchao Liu,et al.  CUSHAW: a CUDA compatible short read aligner to large genomes based on the Burrows-Wheeler transform , 2012, Bioinform..

[11]  Paul Flicek,et al.  Sense from sequence reads: methods for alignment and assembly , 2009, Nature Methods.

[12]  S. Altschul,et al.  The estimation of statistical parameters for local alignment score distributions. , 2001, Nucleic acids research.

[13]  S. K. Nandy,et al.  ReneGENE-DP: Accelerated Parallel Dynamic Programming for Genome Informatics , 2018, 2018 IEEE International Conference on Electronics, Computing and Communication Technologies (CONECCT).

[14]  M S Waterman,et al.  Identification of common molecular subsequences. , 1981, Journal of molecular biology.

[15]  Yongchao Liu,et al.  CUSHAW2-GPU: Empowering Faster Gapped Short-Read Alignment Using GPU Computing , 2014, IEEE Design & Test.

[16]  M. Schatz,et al.  Big Data: Astronomical or Genomical? , 2015, PLoS biology.

[17]  Monya Baker,et al.  Next-generation sequencing: adjusting to data overload , 2010, Nature Methods.

[18]  Eugene W. Myers,et al.  A sublinear algorithm for approximate keyword searching , 1994, Algorithmica.

[19]  David Meyre,et al.  From big data analysis to personalized medicine for all: challenges and opportunities , 2015, BMC Medical Genomics.

[20]  Elaine R. Mardis,et al.  A decade’s perspective on DNA sequencing technology , 2011, Nature.

[21]  S. K. Nandy,et al.  AccuRA: Accurate alignment of short reads on scalable reconfigurable accelerators , 2016, 2016 International Conference on Embedded Computer Systems: Architectures, Modeling and Simulation (SAMOS).