Compressing three-dimensional sparse arrays using inter- and intra-task parallelization strategies on Intel Xeon and Xeon Phi

Array operations are useful in a lot of scientific codes. In recent years, several applications, such as the geological analysis and the medical images processing, are processed using array operations for three-dimensional (abbreviate to “3D”) sparse arrays. Due to the huge computation time, it is necessary to compress 3D sparse arrays and use parallel computing technologies to speed up sparse array operations. How to compress the sparse arrays efficiently is an important task for practical applications. Hence, in this paper, two strategies, inter- and intra-task parallelization (abbreviate to “ETP” and “RTP”), are presented to compress 3D sparse arrays, respectively. Each strategy was designed and implemented on Intel Xeon and Xeon Phi, respectively. From experimental results, the ETP strategy achieves 17.5$$\times $$× and 18.2$$\times $$× speedup ratios based on Intel Xeon E5-2670 v2 and Intel Xeon Phi SE10X, respectively; 4.5$$\times $$× and 4.5$$\times $$× speedup ratios for the RTP strategy based on these two environments, respectively.

[1]  William H. Press,et al.  Numerical recipes in Fortran 90: the art of parallel scientific computing, 2nd Edition , 1996, Fortran numerical recipes.

[2]  Vasilis Ntziachristos,et al.  Three-dimensional optoacoustic tomography using a conventional ultrasound linear detector array: whole-body tomographic system for small animals. , 2013, Medical physics.

[3]  Chun-Yuan Lin,et al.  Efficient Representation Scheme for Multidimensional Array Operations , 2002, IEEE Trans. Computers.

[4]  Chun-Yuan Lin,et al.  Efficient strategy for compressing sparse matrices on Graphics Processing Units , 2013, 2013 International Conference on Computational Problem-Solving (ICCP).

[5]  Chun-Yuan Lin,et al.  Data distribution schemes of sparse arrays on distributed memory multicomputers , 2007, The Journal of Supercomputing.

[6]  J. Cullum,et al.  Lanczos algorithms for large symmetric eigenvalue computations , 1985 .

[7]  Michael Klemm,et al.  OpenMP Programming on Intel Xeon Phi Coprocessors: An Early Performance Comparison , 2012, MARC@RWTH.

[8]  Chun-Yuan Lin,et al.  Efficient Data Compression Methods for Multidimensional Sparse Array Operations Based on the EKMR Scheme , 2003, IEEE Trans. Computers.

[9]  Xing Liu,et al.  Efficient sparse matrix-vector multiplication on x86-based many-core processors , 2013, ICS '13.

[10]  Chun-Yuan Lin,et al.  Efficient Data Distribution Scheme for Multi-Dimensional Sparse Arrays , 2007, J. Inf. Sci. Eng..

[11]  John G. Lewis,et al.  Sparse matrix test problems , 1982, SGNM.

[12]  Weiguo Liu,et al.  Bio-sequence database scanning on a GPU , 2006, Proceedings 20th IEEE International Parallel & Distributed Processing Symposium.

[13]  Bertil Schmidt,et al.  Reconfigurable architectures for bio-sequence database scanning on FPGAs , 2005, IEEE Transactions on Circuits and Systems II: Express Briefs.

[14]  Jenq Kuen Lee,et al.  Parallel Sparse Supports for Array Intrinsic Functions of Fortran 90 , 2001, The Journal of Supercomputing.

[15]  J. Chambers,et al.  Three-dimensional geophysical anatomy of an active landslide in Lias Group mudrocks, Cleveland Basin, UK , 2011 .

[16]  Christophe Dessimoz,et al.  SWPS3 – fast multi-threaded vectorized Smith-Waterman for IBM Cell/B.E. and ×86/SSE2 , 2008, BMC Research Notes.

[17]  Wei Li,et al.  Unifying data and control transformations for distributed shared-memory machines , 1995, PLDI '95.

[18]  M. P. Levin,et al.  Numerical Recipes In Fortran 90: The Art Of Parallel Scientific Computing , 1998, IEEE Concurrency.

[19]  Chau-Wen Tseng,et al.  Improving data locality with loop transformations , 1996, TOPL.

[20]  Ümit V. Çatalyürek,et al.  Performance Evaluation of Sparse Matrix Multiplication Kernels on Intel Xeon Phi , 2013, PPAM.

[21]  Robert H. Halstead,et al.  Matrix Computations , 2011, Encyclopedia of Parallel Computing.

[22]  Chun-Yuan Lin,et al.  Efficient Data Parallel Algorithms for Multidimensional Array Operations Based on the EKMR Scheme for Distributed Memory Multicomputers , 2003, IEEE Trans. Parallel Distributed Syst..

[23]  Richard Barrett,et al.  Templates for the Solution of Linear Systems: Building Blocks for Iterative Methods , 1994, Other Titles in Applied Mathematics.