Spark-based large-scale matrix inversion for big data processing

Matrix inversion is a fundamental operation to solve linear equations for many computational applications. However, it is a challenging task to invert large-scale matrices of extremely high order (several thousands), which are common in most of web-scale systems like social networks and recommendation systems. In this paper, we present a LU decomposition based block-recursive algorithm for large-scale matrix inversion, and its well-designed implementation with optimized data structure, reduction of space complexity and effective matrix multiplication on the Spark parallel computing platform. The experimental evaluation results show that the proposed algorithm is efficient to invert large-scale matrices on a cluster composed of commodity servers and scalable to invert even larger matrices. The proposed algorithm and implementation will be a solid base to build a high-performance linear algebra library on Spark for big data processing.

[1]  Matei Zaharia,et al.  linalg: Matrix Computations in Apache Spark , 2015, ArXiv.

[2]  Ashraf Aboulnaga,et al.  Scalable matrix inversion using MapReduce , 2014, HPDC '14.

[3]  William H. Press,et al.  Numerical Recipes 3rd Edition: The Art of Scientific Computing , 2007 .

[4]  S. Althoen,et al.  Gauss-Jordan reduction: a brief history , 1987 .

[5]  Jack Dongarra,et al.  ScaLAPACK Users' Guide , 1987 .

[6]  Enrique S. Quintana-Ortí,et al.  High Performance Matrix Inversion on a Multi-core Platform with Several GPUs , 2011, 2011 19th International Euromicro Conference on Parallel, Distributed and Network-Based Processing.

[7]  Jin-Soo Kim,et al.  HAMA: An Efficient Matrix Computation with the MapReduce Framework , 2010, 2010 IEEE Second International Conference on Cloud Computing Technology and Science.

[8]  Emmanuel Agullo,et al.  Towards an Efficient Tile Matrix Inversion of Symmetric Positive Definite Matrices on Multicore Architectures , 2010, VECPAR.

[9]  Yan Zhang,et al.  On Architecture Design, Congestion Notification, TCP Incast and Power Consumption in Data Centers , 2013, IEEE Communications Surveys & Tutorials.

[10]  Feng Liu,et al.  Monitoring and analyzing big traffic data of a large-scale cellular network with Hadoop , 2014, IEEE Network.

[11]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[12]  Scott Shenker,et al.  Spark: Cluster Computing with Working Sets , 2010, HotCloud.

[13]  Robert A. van de Geijn,et al.  Families of algorithms related to the inversion of a Symmetric Positive Definite matrix , 2008, TOMS.

[14]  Baidurya Bhattacharya,et al.  Technical Note: A fast parallel Gauss Jordan algorithm for matrix inversion using CUDA , 2013 .

[15]  Jack Dongarra,et al.  LAPACK: a portable linear algebra library for high-performance computers , 1990, SC.

[16]  Matei Zaharia,et al.  Matrix Computations and Optimization in Apache Spark , 2015, KDD.

[17]  R. Venkatesh,et al.  Parallel matrix inversion techniques , 1996, Proceedings of 1996 IEEE Second International Conference on Algorithms and Architectures for Parallel Processing, ICA/sup 3/PP '96.

[18]  Jack J. Dongarra,et al.  LINPACK Benchmark , 2011, Encyclopedia of Parallel Computing.

[19]  Matei Zaharia,et al.  Tachyon : Memory Throughput I / O for Cluster Computing Frameworks , 2013 .

[20]  Yubai Li,et al.  A Parallel Method for Matrix Inversion Based on Gauss-jordan Algorithm , 2013 .

[21]  E. Caron,et al.  Parallel out-of-core matrix inversion , 2002, Proceedings 16th International Parallel and Distributed Processing Symposium.

[22]  Jack J. Dongarra,et al.  High performance matrix inversion based on LU factorization for multicore architectures , 2011, MTAGS '11.

[23]  Cheng Fang,et al.  Spark-based large-scale matrix inversion for big data processing , 2016, INFOCOM Workshops.

[24]  M. Ylinen,et al.  A fixed-point implementation of matrix inversion using Cholesky decomposition , 2003, 2003 46th Midwest Symposium on Circuits and Systems.