Performance modeling of hyper-scale custom machine for the principal steps in block Wiedemann algorithm

Solving large-scale sparse linear systems over GF(2) plays a key role in fluid mechanics, simulation and design of materials, petroleum seismic data processing, numerical weather prediction, computational electromagnetics, and numerical simulation of unclear explosions. Therefore, developing algorithms for this issue is a significant research topic. In this paper, we proposed a hyper-scale custom supercomputer architecture that matches specific data features to process the key procedure of block Wiedemann algorithm and its parallel algorithm on the custom machine. To increase the computation, communication, and storage performance, four optimization strategies are proposed. This paper builds a performance model to evaluate the execution performance and power consumption for our custom machine. The model shows that the optimization strategies result in a considerable speedup, even three times faster than the fastest supercomputer, TH2, while consuming less power.

[1]  Pavel Tvrdík,et al.  Evaluation Criteria for Sparse Matrix Storage Formats , 2016, IEEE Transactions on Parallel and Distributed Systems.

[2]  Kermin Fleming,et al.  Hardware Acceleration of Matrix Multiplication on a Xilinx FPGA , 2007, 2007 5th IEEE/ACM International Conference on Formal Methods and Models for Codesign (MEMOCODE 2007).

[3]  Kazumaro Aoki,et al.  Experiments on the Linear Algebra Step in the Number Field Sieve , 2007, IWSEC.

[4]  Rajesh Bordawekar,et al.  Optimizing Sparse Matrix-Vector Multiplication on GPUs using Compile-time and Run-time Strategies , 2008 .

[5]  John R. Gilbert,et al.  On the representation and multiplication of hypersparse matrices , 2008, 2008 IEEE International Symposium on Parallel and Distributed Processing.

[6]  Jack J. Dongarra,et al.  Energy efficiency and performance frontiers for sparse computations on GPU supercomputers , 2015, PMAM '15.

[7]  Yong Dou,et al.  64-bit floating-point FPGA matrix multiplication , 2005, FPGA '05.

[8]  Emmanuel Thomé,et al.  Fast computation of linear generators for matrix sequences and application to the block Wiedemann algorithm , 2001, ISSAC '01.

[9]  Nachiket Kapre,et al.  A Case for Embedded FPGA-based SoCs in Energy-Efficient Acceleration of Graph Problems , 2015, Supercomput. Front. Innov..

[10]  Michael M. Wolf,et al.  Quantifying the effect of matrix structure on multithreaded performance of the SpMV kernel , 2014, 2014 IEEE High Performance Extreme Computing Conference (HPEC).

[11]  Ilya I. Levin,et al.  High-Performance Reconfigurable Computer Systems Based on Virtex FPGAs , 2015, PaCT.

[12]  Brendan Vastenhouw,et al.  A Two-Dimensional Data Distribution Method for Parallel Sparse Matrix-Vector Multiplication , 2005, SIAM Rev..

[13]  D. Meintanis,et al.  A module-based partial reconfiguration design for solving sparse linear systems over GF(2) , 2009, 2009 International Conference on Field-Programmable Technology.

[14]  Canqun Yang,et al.  HPCG: Preliminary Evaluation and Optimization on Tianhe-2 CPU-only Nodes , 2014, 2014 IEEE 26th International Symposium on Computer Architecture and High Performance Computing.

[15]  Bertil Schmidt,et al.  Iterative sparse matrix–vector multiplication for accelerating the block Wiedemann algorithm over GF(2) on multi‐graphics processing unit systems , 2013, Concurr. Comput. Pract. Exp..

[16]  Tim Güneysu,et al.  Enhancing COPACOBANA for advanced applications in cryptography and cryptanalysis , 2008, 2008 International Conference on Field Programmable Logic and Applications.

[17]  P. Sadayappan,et al.  Characterizing dataset dependence for sparse matrix-vector multiplication on GPUs , 2015 .

[18]  Jean-François Méhaut,et al.  Performance analysis of HPC applications on low-power embedded platforms , 2013, 2013 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[19]  D. Coppersmith Solving homogeneous linear equations over GF (2) via block Wiedemann algorithm , 1994 .

[20]  Ümit V. Çatalyürek,et al.  A fine-grain hypergraph model for 2D decomposition of sparse matrices , 2001, Proceedings 15th International Parallel and Distributed Processing Symposium. IPDPS 2001.

[21]  Mateo Valero,et al.  Supercomputing with commodity CPUs: Are mobile SoCs ready for HPC? , 2013, 2013 SC - International Conference for High Performance Computing, Networking, Storage and Analysis (SC).

[22]  Mariette Awad,et al.  FPGA supercomputing platforms: A survey , 2009, 2009 International Conference on Field Programmable Logic and Applications.