Improving Performance of Codes with Large/Irregular Stride Memory Access Patterns via High Performance Reconfigurable Computers

Parallel codes with large-stride/irregular-stride (L/I) memory access patterns, e.g., sparse matrix and linked list codes, often perform poorly on mainstream clusters because of the general purpose processor (GPP) memory hierarchy. High performance reconfigurable computers (HPRCs) are parallel computing clusters containing multiple GPPs and field programmable gate arrays (FPGAs) connected via a high-speed network. In this research, simple 64-bit floating-point parallel codes are used to illustrate the performance impact of L/I memory accesses in software (SW) and FPGA-augmented (FA) codes and to assess the benefits of mapping L/I-type codes onto HPRCs. The experiments reveal that large-stride SW codes, particularly those involving data reuse, experience severe performance degradation compared with unit-stride SW codes. In contrast, large-stride FA codes experience minimal degradation compared with unit-stride FA codes. More importantly, for codes that involve data reuse, the experiments demonstrate performance improvements of up to nearly tenfold for large-stride FA codes compared with large-stride SW codes.

[1]  Viktor K. Prasanna,et al.  A pipelined-loop-compatible architecture and algorithm to reduce variable-length sets of floating-point data on a reconfigurable computer , 2008, J. Parallel Distributed Comput..

[2]  Khalid H. Abed,et al.  Design Heuristics for Mapping Floating-Point Scientific Computational Kernels onto High Performance Reconfigurable Computers , 2009, J. Comput..

[3]  Khalid H. Abed,et al.  Mapping Hierarchical Multiple File VHDL Kernels onto an SRC-7 High Performance Reconfigurable Computer , 2010, 2010 DoD High Performance Computing Modernization Program Users Group Conference.

[4]  Khalid H. Abed,et al.  Improving Performance of Codes with Large/Irregular Stride Memory Access Patterns via High Performance Reconfigurable Computers , 2009, HiPC 2009.

[5]  Khalid H. Abed,et al.  Accelerating a Sparse Matrix Iterative Solver Using a High Performance Reconfigurable Computer , 2010, 2010 DoD High Performance Computing Modernization Program Users Group Conference.

[6]  Gerald Estrin,et al.  Organization of computer systems: the fixed plus variable structure computer , 1960, IRE-AIEE-ACM '60 (Western).

[7]  Timothy A. Davis,et al.  The university of Florida sparse matrix collection , 2011, TOMS.

[8]  B. Santo 25 microchips that shook the world , 2009, IEEE Spectrum.

[9]  G. Amdhal,et al.  Validity of the single processor approach to achieving large scale computing capabilities , 1967, AFIPS '67 (Spring).

[10]  Khalid H. Abed,et al.  FPGA-based implementation of Horner's rule on a high performance heterogeneous computer , 2011, 2011 Proceedings of IEEE Southeastcon.

[11]  Richard W. Vuduc,et al.  Sparsity: Optimization Framework for Sparse Matrix Kernels , 2004, Int. J. High Perform. Comput. Appl..

[12]  Khalid H. Abed,et al.  Achieving true parallelism on a High Performance Heterogeneous Computer via a threaded programming model , 2011, 2011 Proceedings of IEEE Southeastcon.

[13]  Khalid H. Abed,et al.  Integrating Quartus Wizard-based VHDL floating-point components into a high performance heterogeneous computing environment , 2011, 2011 Proceedings of IEEE Southeastcon.

[14]  Viktor K. Prasanna,et al.  Sparse Matrix-Vector multiplication on FPGAs , 2005, FPGA '05.

[15]  Viktor K. Prasanna,et al.  Sparse Matrix Computations on Reconfigurable Hardware , 2007, Computer.

[16]  E. Cuthill,et al.  Reducing the bandwidth of sparse symmetric matrices , 1969, ACM '69.

[17]  Viktor K. Prasanna,et al.  An FPGA-Based Application-Specific Processor for Efficient Reduction of Multiple Variable-Length Floating-Point Data Sets , 2006, IEEE 17th International Conference on Application-specific Systems, Architectures and Processors (ASAP'06).

[18]  Viktor K. Prasanna,et al.  Hardware/Software Approach to Molecular Dynamics on Reconfigurable Computers , 2006, 2006 14th Annual IEEE Symposium on Field-Programmable Custom Computing Machines.

[19]  Jack J. Dongarra,et al.  Automated empirical optimizations of software and the ATLAS project , 2001, Parallel Comput..

[20]  Stephen Booth,et al.  Maxwell - a 64 FPGA Supercomputer , 2007, Second NASA/ESA Conference on Adaptive Hardware and Systems (AHS 2007).

[21]  Yuefan Deng,et al.  New trends in high performance computing , 2001, Parallel Computing.

[22]  I. Duff A survey of sparse matrix research , 1977, Proceedings of the IEEE.

[23]  K.H. Abed,et al.  Reconfigurable computer application design considerations , 2008, IEEE SoutheastCon 2008.

[24]  Samuel Williams,et al.  Optimization of sparse matrix-vector multiplication on emerging multicore platforms , 2007, Proceedings of the 2007 ACM/IEEE Conference on Supercomputing (SC '07).