Low latency and division free Gauss-Jordan solver in floating point arithmetic

In many applications, the solution of a linear system is computed with Gaussian elimination followed by back-substitution, or GaussJordan elimination. The latter is intrinsically more parallel, enabling smaller computing latencies at the price of more complex hardware. However both methods require the division operator, which leads to a time-consuming resource in the critical path of the algorithms and impacts the global processings latency. Jordan was already aware of a division free algorithm. However, its implementation involves multiplications at each step and the size of the numbers rapidly becomes too big for an efficient implementation of large systems. In this work, we present a small modification to the division free algorithm in order to keep the size of the numbers in a reasonable range for standard floating point numbers. This is possible thanks to the special format of floating point numbers, which enables error free and hardware efficient divisions by powers of two. We also propose a parallel and pipelined architecture that best exploits the proposed algorithm, including partial pivoting. We specially focus on the global latency of the system as a function of its size, the latency of the floating point operators, and the number of operators that are available. Results demonstrate that current FPGAs can solve linear systems larger than hundred equations within ten microseconds. This represents a two order of magnitude improvement over previous implementations for relatively small systems. Low latency solvers are necessary for real time applications (simulation/control).The divider circuits used in most previous works induce long latencies.We propose a division free parallel architecture adapted to floating point arithmetic.We obtain two orders of magnitude gains compared to previous works.100-equation systems can be solved under 10 microseconds.

[1]  Jiadai Liu,et al.  Nonlinear Magnetic Equivalent Circuit-Based Real-Time Sen Transformer Electromagnetic Transient Model on FPGA for HIL Emulation , 2017, IEEE Transactions on Power Delivery.

[2]  Chika O. Nwankpa,et al.  High-Performance Linear Algebra Processor using FPGA , 2004 .

[3]  Jean-Pierre David Low latency solver for linear equation systems in floating point arithmetic , 2015, 2015 International Conference on ReConFigurable Computing and FPGAs (ReConFig).

[4]  Carlos H. Llanos,et al.  A fast and low cost architecture developed in FPGAs for solving systems of linear equations , 2012, 2012 IEEE 3rd Latin American Symposium on Circuits and Systems (LASCAS).

[5]  Jean Mahseredjian,et al.  Effective floating-point calculation engines intended for the FPGA-based HIL simulation , 2012, 2012 IEEE International Symposium on Industrial Electronics.

[6]  Mário P. Véstias,et al.  Double-precision Gauss-Jordan Algorithm with Partial Pivoting on FPGAs , 2009, 2009 12th Euromicro Conference on Digital System Design, Architectures, Methods and Tools.

[7]  Carlos H. Llanos,et al.  FPGA implementation of large-scale matrix inversion using single, double and custom floating-point precision , 2012, 2012 VIII Southern Conference on Programmable Logic.

[8]  Shietung Peng,et al.  Parallel algorithm and architecture for two-step division-free Gaussian elimination , 1996, Proceedings of International Conference on Application Specific Systems, Architectures and Processors: ASAP '96.

[9]  E. Bareiss Sylvester’s identity and multistep integer-preserving Gaussian elimination , 1968 .

[10]  Habib Hamam,et al.  FPGA implementation of floating-point complex matrix inversion based on GAUSS-JORDAN elimination , 2013, 2013 26th IEEE Canadian Conference on Electrical and Computer Engineering (CCECE).

[11]  Viktor K. Prasanna,et al.  Efficient Floating-point Based Block LU Decomposition on FPGAs , 2004, ERSA.

[12]  Peter R. Turner,et al.  Modified Gaussian Elimination for Adaptive Beam Forming Using RNS Arithmetic. , 1994 .

[13]  Jean-Claude Bermond,et al.  Parallelization of the {Gaussian} Elimination Algorithm on Systolic Arrays , 1996, J. Parallel Distributed Comput..

[14]  Tarek Ould Bachir,et al.  General-purpose reconfigurable low-latency electric circuit and motor drive solver on FPGA , 2012, IECON 2012 - 38th Annual Conference on IEEE Industrial Electronics Society.

[15]  M. Tarek Ibn Ziad,et al.  On hardware solution of dense linear systems via Gauss-Jordan Elimination , 2015, 2015 IEEE Pacific Rim Conference on Communications, Computers and Signal Processing (PACRIM).

[16]  Jeffrey Hammes,et al.  A Transformational Approach to High Performance Embedded Computing , 2004 .

[17]  D. Torres-Lucio,et al.  Array Processors Designed with VHDL for Solution of Linear Equation Systems Implemented in a FPGA , 2010, 2010 IEEE Electronics, Robotics and Automotive Mechanics Conference.

[18]  Wei Zhang,et al.  Portable and scalable FPGA-based acceleration of a direct linear system solver , 2008, 2008 International Conference on Field-Programmable Technology.

[19]  Yves Robert,et al.  Mixing LU and QR factorization algorithms to design high-performance dense linear algebra solvers , 2015, J. Parallel Distributed Comput..

[20]  Horácio C. Neto,et al.  On Reconfigurable Architectures for Efficient Matrix Inversion , 2006, 2006 International Conference on Field Programmable Logic and Applications.

[21]  Vijaya Ramachandran,et al.  Oblivious algorithms for multicores and network of processors , 2010, 2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS).

[22]  Reinhard Männer,et al.  Using floating-point arithmetic on FPGAs to accelerate scientific N-Body simulations , 2002, Proceedings. 10th Annual IEEE Symposium on Field-Programmable Custom Computing Machines.

[23]  Kamal Al-Haddad,et al.  A Network Tearing Technique for FPGA-Based Real-Time Simulation of Power Converters , 2015, IEEE Transactions on Industrial Electronics.

[24]  Carlos H. Llanos,et al.  FPGA HIL simulation of a linear system block for strongly coupled system applications , 2013, 2013 IEEE International Conference on Industrial Technology (ICIT).

[25]  Viktor K. Prasanna,et al.  Sparse Matrix Computations on Reconfigurable Hardware , 2007, Computer.

[26]  Viktor K. Prasanna,et al.  A Library of Parameterizable Floating-Point Cores for FPGAs and Their Application to Scientific Computing , 2005, ERSA.

[27]  Peter R. Turner,et al.  Adaptive beamforming using RNS arithmetic , 1993, Proceedings of IEEE 11th Symposium on Computer Arithmetic.

[28]  Viktor K. Prasanna,et al.  High-Performance Designs for Linear Algebra Operations on Reconfigurable Hardware , 2008, IEEE Transactions on Computers.

[29]  Frank Swetz,et al.  The Nine Chapters on the Mathematical Art Companion and Commentary , 2001 .