FPGA Accelerated INDEL Realignment in the Cloud

The amount of data being generated in genomics is predicted to be between 2 and 40 exabytes per year for the next decade, making genomic analysis the new frontier and the new challenge for precision medicine. This paper explores targeted deployment of hardware accelerators in the cloud to improve the runtime and throughput of immensescale genomic data analyses. In particular, INDEL (INsertion/DELetion) realignment is a critical operation that enables diagnostic testings of cancer through error correction prior to variant calling. It is the slowest part of the somatic (cancer) genomic analysis pipeline, the alignment refinement pipeline, and represents roughly one-third of the execution time of timesensitive diagnostics for acute cancer patients. To accelerate genomic analysis, this paper describes a hardware accelerator for INDEL realignment (IR), and a hardware-software framework leveraging FPGAs-as-a-service in the cloud. We chose to implement genomics analytics on FPGAs because genomic algorithms are still rapidly evolving (e.g. the de facto standard “GATK Best Practices” has had five releases since January of this year). We chose to deploy genomics accelerators in the cloud to reduce capital expenditure and to provide a more quantitative performance and cost analysis. We built and deployed a sea of IR accelerators using our hardware-software accelerator development framework on AWS EC2 F1 instances. We show that our IR accelerator system performed 81× better than multi-threaded genomic analysis software while being 32× more cost efficient. Keywords-Computer Architecture, Microarchitecture, Accelerator Architecture, Hardware Specialization, Genomic Analytics, INDEL Realignment, FPGA Acceleration, FPGAs-as-aservice, Cloud FPGAs

[1]  David A. Patterson,et al.  ADAM: Genomics Formats and Processing Patterns for Cloud Scale Computing , 2013 .

[2]  Zhao Zhang,et al.  Rethinking Data-Intensive Science Using Scalable Analytics Systems , 2015, SIGMOD Conference.

[3]  Yongchao Liu,et al.  CUDASW++: optimizing Smith-Waterman sequence database searches for CUDA-enabled graphics processing units , 2009, BMC Research Notes.

[4]  Deanna M. Church,et al.  Genome Reference Consortium , 2013 .

[5]  Joel S. Emer,et al.  Exploiting spatial architectures for edit distance algorithms , 2014, 2014 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS).

[6]  Heng Li,et al.  Toward better understanding of artifacts in variant calling from high-coverage samples , 2014, Bioinform..

[7]  Kevin Truong,et al.  160-fold acceleration of the Smith-Waterman algorithm using a field programmable gate array (FPGA) , 2007, BMC Bioinformatics.

[8]  Onur Mutlu,et al.  GateKeeper: a new hardware architecture for accelerating pre‐alignment in DNA short read mapping , 2016, Bioinform..

[9]  Adam M. Izraelevitz,et al.  The Rocket Chip Generator , 2016 .

[10]  Graham Pullan,et al.  BarraCUDA - a fast short read sequence aligner using graphics processing units , 2011, BMC Research Notes.

[11]  James R. Larus,et al.  Persona: A High-Performance Bioinformatics Framework , 2017, USENIX Annual Technical Conference.

[12]  Tom Feist,et al.  Vivado Design Suite , 2012 .

[13]  Sean R. Eddy,et al.  Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids , 1998 .

[14]  Peter M. Rice,et al.  The Sanger FASTQ file format for sequences with quality scores, and the Solexa/Illumina FASTQ variants , 2009, Nucleic acids research.

[15]  Deming Chen,et al.  Hardware Acceleration of the Pair-HMM Algorithm for DNA Variant Calling , 2017, FPGA.

[16]  Mauricio O. Carneiro,et al.  From FastQ Data to High‐Confidence Variant Calls: The Genome Analysis Toolkit Best Practices Pipeline , 2013, Current protocols in bioinformatics.

[17]  Giovanni Martinelli,et al.  Optimized pipeline of MuTect and GATK tools to improve the detection of somatic single nucleotide polymorphisms in whole-exome sequencing data , 2016, BMC Bioinformatics.

[18]  Pradeep Dubey,et al.  Debunking the 100X GPU vs. CPU myth: an evaluation of throughput computing on CPU and GPU , 2010, ISCA.

[19]  O. Hofmann,et al.  VarDict: a novel and versatile variant caller for next-generation sequencing in cancer research , 2016, Nucleic acids research.

[20]  Yongchao Liu,et al.  CUSHAW2-GPU: Empowering Faster Gapped Short-Read Alignment Using GPU Computing , 2014, IEEE Design & Test.

[21]  Torbjørn Rognes,et al.  Faster Smith-Waterman database searches with inter-sequence SIMD parallelisation , 2011, BMC Bioinformatics.

[22]  Han Fang,et al.  "Towards Better Understanding of Artifacts in Variant Calling from High-Coverage Samples" , 2014 .

[23]  Gabor T. Marth,et al.  A global reference for human genetic variation , 2015, Nature.

[24]  Marios Savvides,et al.  CUDA accelerated iris template matching on Graphics Processing Units (GPUs) , 2010, 2010 Fourth IEEE International Conference on Biometrics: Theory, Applications and Systems (BTAS).

[25]  Jason Cong,et al.  A Novel High-Throughput Acceleration Engine for Read Alignment , 2015, 2015 IEEE 23rd Annual International Symposium on Field-Programmable Custom Computing Machines.

[26]  Kenny Q. Ye,et al.  An integrated map of genetic variation from 1,092 human genomes , 2012, Nature.

[27]  David A. Patterson,et al.  In-datacenter performance analysis of a tensor processing unit , 2017, 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA).

[28]  Heng Li Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM , 2013, 1303.3997.

[29]  Joseph M. Lancaster,et al.  A Banded Smith-Waterman FPGA Accelerator for Mercury BLASTP , 2007, 2007 International Conference on Field Programmable Logic and Applications.

[30]  M. DePristo,et al.  The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. , 2010, Genome research.

[31]  P. Green,et al.  Base-calling of automated sequencer traces using phred. I. Accuracy assessment. , 1998, Genome research.

[32]  Wendy S. W. Wong,et al.  Strelka: accurate somatic small-variant calling from sequenced tumor-normal sample pairs , 2012, Bioinform..

[33]  John Wawrzynek,et al.  Chisel: Constructing hardware in a Scala embedded language , 2012, DAC Design Automation Conference 2012.

[34]  William J. Dally,et al.  Darwin: A Hardware-acceleration Framework for Genomic Sequence Alignment , 2017, bioRxiv.

[35]  Jesse D. Miller,et al.  An Introduction to Next-Generation Sequencing Technology , 2011 .

[36]  Jason Cong,et al.  When apache spark meets FPGAs: a case study for next-generation DNA sequencing acceleration , 2016, CloudCom 2016.

[37]  William J. Dally,et al.  Darwin: A Genomics Co-processor Provides up to 15,000X Acceleration on Long Read Assembly , 2018, USENIX Annual Technical Conference.

[38]  Esko Ukkonen,et al.  Algorithms for Approximate String Matching , 1985, Inf. Control..

[39]  Pao-Ann Hsiung,et al.  A Tiling-Scheme Viterbi Decoder in Software Defined Radio for GPUs , 2011, 2011 7th International Conference on Wireless Communications, Networking and Mobile Computing.

[40]  Chang Xu,et al.  A review of somatic single nucleotide variant calling algorithms for next-generation sequencing data , 2018, Computational and structural biotechnology journal.

[41]  Richard M. Karp,et al.  Faster and More Accurate Sequence Alignment with SNAP , 2011, ArXiv.

[42]  Cory Y. McLean,et al.  Creating a universal SNP and small indel variant caller with deep neural networks , 2016, bioRxiv.

[43]  M S Waterman,et al.  Identification of common molecular subsequences. , 1981, Journal of molecular biology.

[44]  M. DePristo,et al.  A framework for variation discovery and genotyping using next-generation DNA sequencing data , 2011, Nature Genetics.

[45]  Ernst Houtgast,et al.  Heterogeneous hardware/software acceleration of the BWA-MEM DNA alignment algorithm , 2015, 2015 IEEE/ACM International Conference on Computer-Aided Design (ICCAD).