On-Chip Reconfigurable Hardware Accelerators for Popcount Computations

Popcount computations are widely used in such areas as combinatorial search, data processing, statistical analysis, and bio- and chemical informatics. In many practical problems the size of initial data is very large and increase in throughput is important. The paper suggests two types of hardware accelerators that are 1 designed in FPGAs and 2 implemented in Zynq-7000 all programmable systems-on-chip with partitioning of algorithms that use popcounts between software of ARM Cortex-A9 processing system and advanced programmable logic. A three-level system architecture that includes a general-purpose computer, the problem-specific ARM, and reconfigurable hardware is then proposed. The results of experiments and comparisons with existing benchmarks demonstrate that although throughput of popcount computations is increased in FPGA-based designs interacting with general-purpose computers, communication overheads in experiments with PCI express are significant and actual advantages can be gained if not only popcount but also other types of relevant computations are implemented in hardware. The comparison of software/hardware designs for Zynq-7000 all programmable systems-on-chip with pure software implementations in the same Zynq-7000 devices demonstrates increase in performance by a factor ranging from 5 to 19 taking into account all the involved communication overheads between the programmable logic and the processing systems.

[1]  S. Piestrak Efficient Hamming weight comparators of binary vectors , 2007 .

[2]  Jason Cong,et al.  High-Level Synthesis for FPGAs: From Prototyping to Deployment , 2011, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[3]  R. K. Shyamasundar,et al.  Introduction to algorithms , 1996 .

[4]  Qiang Yang,et al.  BOOST: A fast approach to detecting gene-gene interactions in genome-wide case-control studies , 2010, American journal of human genetics.

[5]  Pierre Baldi,et al.  Speeding Up Chemical Searches Using the Inverted Index: The Convergence of Chemoinformatics and Text Search Methods , 2012, J. Chem. Inf. Model..

[6]  Valery Sklyarov,et al.  Multi-core DSP-based Vector Set Bits Counters/Comparators , 2015, J. Signal Process. Syst..

[7]  Ronald L. Rivest,et al.  Introduction to Algorithms, third edition , 2009 .

[8]  Gurmeet Singh Manku,et al.  Detecting near-duplicates for web crawling , 2007, WWW '07.

[9]  Ge Zhang,et al.  A comparison study of succinct data structures for use in GWAS , 2013, BMC Bioinformatics.

[10]  Valery Sklyarov,et al.  High-performance implementation of regular and easily scalable sorting networks on an FPGA , 2014, Microprocess. Microsystems.

[11]  Valery Sklyarov,et al.  Design and implementation of counting networks , 2013, Computing.

[12]  Jiaheng Lu,et al.  HmSearch: an efficient hamming distance query processing algorithm , 2013, SSDBM.

[13]  Alexander Schliep,et al.  Selecting Oligonucleotide Probes for Whole-Genome Tiling Arrays with a Cross-Hybridization Potential , 2011, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[14]  Attila Gyenesei,et al.  BiForce Toolbox: powerful high-throughput computational analysis of gene–gene interactions in genome-wide association studies , 2012, Nucleic Acids Res..

[15]  Valery Sklyarov,et al.  Hardware implementations of software programs based on hierarchical finite state machine models , 2013, Comput. Electr. Eng..

[16]  Björn Usadel,et al.  Trimmomatic: a flexible trimmer for Illumina sequence data , 2014, Bioinform..

[17]  Behrooz Parhami,et al.  Efficient Hamming Weight Comparators for Binary Vectors Based on Accumulative and Up/Down Parallel Counters , 2009, IEEE Transactions on Circuits and Systems II: Express Briefs.

[18]  K. Chen Bit-serial realizations of a class of nonlinear filters based on positive Boolean functions , 1989 .

[19]  Guy Jacobson,et al.  Space-efficient static trees and graphs , 1989, 30th Annual Symposium on Foundations of Computer Science.

[20]  Edward J. Coyle,et al.  Stack filters , 1986, IEEE Trans. Acoust. Speech Signal Process..

[21]  Alexander Barkalov,et al.  Synthesis and Optimization of FPGA-Based Systems , 2014 .

[22]  Navin Kashyap,et al.  On the Design of Codes for DNA Computing , 2005, WCC.

[23]  Louise H. Crockett,et al.  The Zynq Book: Embedded Processing with the Arm Cortex-A9 on the Xilinx Zynq-7000 All Programmable Soc , 2014 .

[24]  Marco Storace,et al.  Digital architectures realizing piecewise‐linear multivariate functions: Two FPGA implementations , 2011, Int. J. Circuit Theory Appl..

[25]  N. Sukavanam,et al.  An unsupervised learning based neural network approach for a robotic manipulator , 2017 .

[26]  Bo Zhang,et al.  Secure Hamming distance based record linkage with malicious adversaries , 2014, Comput. Electr. Eng..

[27]  Donald Ervin Knuth,et al.  The Art of Computer Programming , 1968 .

[28]  Jean-Sébastien Coron,et al.  Externalized Fingerprint Matching , 2004, ICBA.

[29]  Valery Sklyarov,et al.  Fast Matrix Covering in All Programmable Systems-on-Chip , 2014 .

[30]  Serban Nacu,et al.  Fast and SNP-tolerant detection of complex variants and splicing in short reads , 2010, Bioinform..