论文信息 - A Parallel, Energy Efficient Hardware Architecture for the merAligner on FPGA Using Chisel HCL

A Parallel, Energy Efficient Hardware Architecture for the merAligner on FPGA Using Chisel HCL

Nowadays science has made great progress in extracting information from DNA and the huge amounts of data that is being produced need new ways and architectures to carry on the computation in an efficient way. Among the different analysis performed on the DNA, one of the most compute intensive concerns the task of aligning a set of strings (reads) to specific targets. In these regards, Lawrence Berkeley National Laboratory (LBNL) and the University of California Berkeley (UCB) developed the merAligner: a fully parallel sequence aligner that uses a seed-and-extend algorithm to perform the alignment. This aligner is able to scale up efficiently to thousands of cores on a Cray XC30 supercomputer. Despite the high computational power, this architecture consumes a significant amount of power, reducing considerably its power efficiency. To this end, reconfigurable hardware architectures have demonstrated to be able to deliver high performances, while keeping a relatively low power profile. In this work, we propose an FPGA architecture for the alignment step of the merAligner. The architecture has been designed using Chisel HCL, while the final architecture has been synthesized using Xilinx SDAccel targeting a Xilinx Kintex Ultrascale board. Final results are capable of outperforming merAligner alignment step on a test dataset by a factor of up to 7x in performance and 66x in power efficiency.

[1] Marco D. Santambrogio,et al. Architectural optimizations for high performance and energy efficient Smith-Waterman implementation on FPGAs using OpenCL , 2017, Design, Automation & Test in Europe Conference & Exhibition (DATE), 2017.

[2] John Wawrzynek,et al. Chisel: Constructing hardware in a Scala embedded language , 2012, DAC Design Automation Conference 2012.

[3] Hari Angepat,et al. A cloud-scale acceleration architecture , 2016, 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[4] Daniel H. Huson,et al. MetaSim—A Sequencing Simulator for Genomics and Metagenomics , 2008, PloS one.

[5] Stephen Neuendorffer,et al. FPGA Based OpenCL Acceleration of Genome Sequencing Software , 2015 .

[6] Marco D. Santambrogio,et al. On How to Improve FPGA-Based Systems Design Productivity via SDAccel , 2016, 2016 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW).

[7] Yongchao Liu,et al. SWAPHI: Smith-waterman protein database search on Xeon Phi coprocessors , 2014, 2014 IEEE 25th International Conference on Application-Specific Systems, Architectures and Processors.

[8] Gabor T. Marth,et al. SSW Library: An SIMD Smith-Waterman C/C++ Library for Use in Genomic Applications , 2012, PloS one.

[9] Michael Farrar,et al. Sequence analysis Striped Smith – Waterman speeds database searches six times over other SIMD implementations , 2007 .

[10] Yongchao Liu,et al. CUDASW++ 3.0: accelerating Smith-Waterman protein database search by coupling CPU and GPU SIMD instructions , 2013, BMC Bioinformatics.

[11] Leonid Oliker,et al. merAligner: A Fully Parallel Sequence Aligner , 2015, 2015 IEEE International Parallel and Distributed Processing Symposium.

[12] Witold R. Rudnicki,et al. An efficient implementation of Smith Waterman algorithm on GPU using CUDA, for massively parallel scanning of sequence databases , 2009, 2009 IEEE International Symposium on Parallel & Distributed Processing.

[13] Michael S. Farrar. Optimizing Smith-Waterman for the Cell Broadband Engine , 2008 .