Assessing the potential of hybrid hpc systems for scientific applications: a case study

We have conducted a detailed study to understand the po-tential of hybrid CPU/FPGA High-Performance Computers for improving the performance of data-intensive, scientific applications. In particular, we have focused on an application in proteomics (Polygraph), which is representative of many types of computational analysis applications in the lifesciences: it focuses on extracting useful information from a large body of experimentally collected data (identifying ob-served peptide spectra collected from a mass spectrometer against a well-known protein database). Our preliminary analysis of Polygraph found that morethan half (51%) of the computation time was spent in three routines. We have implemented an FPGA version of themost computationally-intensive routine (20% of the time)on a Cray XD-1 system, and measured the overall speed up achieved in comparison to an optimized software version ofthe routine running on the Cray XD-1's native Opteron processors. We have achieved computational speedups of up to9.16. When we include data movement costs, the overall speedup is reduced to 1.78. We discuss the design and implementation strategies thatled to these results, as well as advantages and limitations we found on the Cray XD-1 platform. We also addressthe advantages and limitations of current development environments, as well as discuss relevant issues we found in our experience as hybrid CPU/FPGA programming model "users".

[1]  Robert J. Fowler,et al.  HPCVIEW: A Tool for Top-down Analysis of Node Performance , 2002, The Journal of Supercomputing.

[2]  Viktor K. Prasanna,et al.  Scalable hybrid designs for linear algebra on reconfigurable computing systems , 2006, 12th International Conference on Parallel and Distributed Systems - (ICPADS'06).

[3]  Alejandro Heredia-Langner,et al.  Comparison of probability and likelihood models for peptide identification from tandem mass spectrometry data. , 2005, Journal of proteome research.

[4]  Alejandro Heredia-Langner,et al.  Constrained de novo peptide identification via multi-objective optimization , 2004, 18th International Parallel and Distributed Processing Symposium, 2004. Proceedings..

[5]  Jeffrey S. Vetter,et al.  Accelerating scientific applications with the SRC-6 reconfigurable computer: methodologies and analysis , 2005, 19th IEEE International Parallel and Distributed Processing Symposium.

[6]  Jaswinder Pal Singh,et al.  A Comparison of MPI, SHMEM and Cache-Coherent Shared Address Space Programming Models on a Tightly-Coupled Multiprocessors , 2001, International Journal of Parallel Programming.

[7]  Maya Gokhale,et al.  Partitioning Hardware and Software for Reconfigurable Supercomputing Applications: A Case Study , 2005, ACM/IEEE SC 2005 Conference (SC'05).

[8]  Maya Gokhale,et al.  Reconfigurable Computing: Accelerating Computation with Field-Programmable Gate Arrays , 2005 .

[9]  Frank Vahid,et al.  A quantitative analysis of the speedup factors of FPGAs over processors , 2004, FPGA '04.

[10]  Mario Cannataro,et al.  Parallel data intensive computing in scientific and commercial applications , 2002, Parallel Comput..

[11]  Steven L. Scott,et al.  Synchronization and communication in the T3E multiprocessor , 1996, ASPLOS VII.

[12]  Edusmildo Orozco,et al.  Reconfigurable Computing. Accelerating Computation with Field-Programmable Gate Arrays , 2007, Scalable Comput. Pract. Exp..