FPGA Acceleration of Protein Back-Translation and Alignment

Identifying genome functionality changes our understanding of humans and helps us in disease diagnosis; as well as drug, bio-material, and genetic engineering of plants and animals. Comparing the structure of the protein sequences, when only sequence information is available, against a database with known functionality helps us to identify and recognize the functionality of the unknown sequence. The process of predicting the possible RNA sequence that a specific protein has originated from is called back-translation. Aligning the back-translated RNA sequence against the database locates the most similar sequences, which is used to predict the functionality of the unknown protein sequence. Providing massive parallelism, FPGAs can accelerate bioinformatics applications substantially. In this paper, we propose, FabP11FabP is also the name of a family of proteins, “Fatty-Acid-Binding Proteins”., an optimized FPGA-based accelerator for aligning a back-translated protein sequence against a database of DNA/RNA sequences. FabP is deeply optimized to fully utilize the FPGA resources and the DRAM memory bandwidth to maximize the performance. FabP on a mid-range FPGA provides 8.1 % and 23.3× (24.8× and 266.8 ×) speedup and higher energy efficiency as compared to the GPU-based implementation on a high-end NVIDIA GPU (state-of-the-art CPU implementation), respectively.

[1]  W. Pearson Searching protein sequence libraries: comparison of the sensitivity and selectivity of the Smith-Waterman and FASTA algorithms. , 1991, Genomics.

[2]  Thomas L. Madden,et al.  BLAST: at the core of a powerful and diverse set of sequence analysis tools , 2004, Nucleic Acids Res..

[3]  R. Agarwala,et al.  Composition-based statistics and translated nucleotide searches: Improving the TBLASTN module of BLAST , 2006, BMC Biology.

[4]  Anishma Shrestha,et al.  Identification and screening of potent antimicrobial peptides in arthropod genomes , 2018, Peptides.

[5]  T. Marschall,et al.  SNP and indel frequencies at transcription start sites and at canonical and alternative translation initiation sites in the human genome , 2019, PloS one.

[6]  B Qian,et al.  Distribution of indel lengths , 2001, Proteins.

[7]  Yongdong Zhang,et al.  H‐BLAST: a fast protein sequence alignment toolkit on heterogeneous computers with GPUs , 2017, Bioinform..

[8]  William J. Dally,et al.  Darwin: A Genomics Co-processor Provides up to 15,000X Acceleration on Long Read Assembly , 2018, USENIX Annual Technical Conference.

[9]  Walter L. Ruzzo,et al.  FPGA Acceleration of Short Read Alignment , 2018, ArXiv.

[10]  D. Lipman,et al.  Extracting protein alignment models from the sequence database. , 1997, Nucleic acids research.

[11]  Kizheppatt Vipin,et al.  HBLast: An Open-Source FPGA Library for DNA Sequencing Acceleration , 2019, 2019 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW).

[12]  D. Ward,et al.  Mutation in the DNA mismatch repair gene homologue hMLH 1 is associated with hereditary non-polyposis colon cancer , 1994, Nature.

[13]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[14]  V. Filonenko,et al.  Glycoprotein A34, a novel target for antibody-based cancer immunotherapy. , 2006, Cancer immunity.

[15]  Waqar Haque,et al.  Pairwise sequence alignment algorithms: a survey , 2009 .

[16]  Thomas L. Madden,et al.  The BLAST Sequence Analysis Tool , 2013 .

[17]  M S Waterman,et al.  Identification of common molecular subsequences. , 1981, Journal of molecular biology.

[18]  S. Nelson,et al.  BFAST: An Alignment Tool for Large Scale Genome Resequencing , 2009, PloS one.

[19]  Aniruddha Datta,et al.  A Survey of Software and Hardware Approaches to Performing Read Alignment in Next Generation Sequencing , 2017, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[20]  Dimitrios Soudris,et al.  A survey on reconfigurable accelerators for cloud computing , 2016, 2016 26th International Conference on Field Programmable Logic and Applications (FPL).

[21]  Mohsen Imani,et al.  FPGA Energy Efficiency by Leveraging Thermal Margin , 2019, 2019 IEEE 37th International Conference on Computer Design (ICCD).

[22]  Tajana Simunic,et al.  Workload-Aware Opportunistic Energy Efficiency in Multi-FPGA Platforms , 2019, 2019 IEEE/ACM International Conference on Computer-Aided Design (ICCAD).

[23]  Christopher A. Voigt,et al.  Toward an orthogonal central dogma. , 2018, Nature chemical biology.

[24]  Wayne Luk,et al.  Reconfigurable acceleration of genetic sequence alignment: A survey of two decades of efforts , 2017, 2017 27th International Conference on Field Programmable Logic and Applications (FPL).

[25]  Celimuge Wu,et al.  Accelerating BLAST Computation on an FPGA-enhanced PC Cluster , 2016, 2016 Fourth International Symposium on Computing and Networking (CANDAR).