Spark-based large-scale matrix inversion for big data processing

Matrix inversion is a fundamental operation to solve linear equations for many computational applications. However, it is a challenging task to invert large-scale matrices of extremely high order (several thousands), which are common in most of web-scale systems like social networks and recommendation systems. In this paper, we present a LU decomposition based block-recursive algorithm for large-scale matrix inversion, and its well-designed implementation with optimized data structure, reduction of space complexity and effective matrix multiplication on the Spark parallel computing platform. The experimental evaluation results show that the proposed algorithm is efficient to invert large-scale matrices on a cluster composed of commodity servers and scalable to invert even larger matrices. The proposed algorithm and implementation will be a solid base to build a high-performance linear algebra library on Spark for big data processing.

[1]  William H. Press,et al.  Numerical Recipes 3rd Edition: The Art of Scientific Computing , 2007 .

[2]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[3]  S. Althoen,et al.  Gauss-Jordan reduction: a brief history , 1987 .

[4]  Scott Shenker,et al.  Spark: Cluster Computing with Working Sets , 2010, HotCloud.

[5]  E. Caron,et al.  Parallel out-of-core matrix inversion , 2002, Proceedings 16th International Parallel and Distributed Processing Symposium.

[6]  Feng Liu,et al.  Monitoring and analyzing big traffic data of a large-scale cellular network with Hadoop , 2014, IEEE Network.

[7]  Yubai Li,et al.  A Parallel Method for Matrix Inversion Based on Gauss-jordan Algorithm , 2013 .

[8]  Jin-Soo Kim,et al.  HAMA: An Efficient Matrix Computation with the MapReduce Framework , 2010, 2010 IEEE Second International Conference on Cloud Computing Technology and Science.

[9]  Emmanuel Agullo,et al.  Towards an Efficient Tile Matrix Inversion of Symmetric Positive Definite Matrices on Multicore Architectures , 2010, VECPAR.

[10]  Jack J. Dongarra,et al.  High performance matrix inversion based on LU factorization for multicore architectures , 2011, MTAGS '11.

[11]  M. Ylinen,et al.  A fixed-point implementation of matrix inversion using Cholesky decomposition , 2003, 2003 46th Midwest Symposium on Circuits and Systems.

[12]  Matei Zaharia,et al.  linalg: Matrix Computations in Apache Spark , 2015, ArXiv.

[13]  Ashraf Aboulnaga,et al.  Scalable matrix inversion using MapReduce , 2014, HPDC '14.