SPIN: A Fast and Scalable Matrix Inversion Method in Apache Spark

The growth of big data in domains such as Earth Sciences, Social Networks, Physical Sciences, etc. has lead to an immense need for efficient and scalable linear algebra operations, e.g. Matrix inversion. Existing methods for efficient and distributed matrix inversion using big data platforms rely on LU decomposition based block-recursive algorithms. However, these algorithms are complex and require a lot of side calculations, e.g. matrix multiplication, at various levels of recursion. In this paper, we propose a different scheme based on Strassen's matrix inversion algorithm (mentioned in Strassen's original paper in 1969), which uses far fewer operations at each level of recursion. We implement the proposed algorithm, and through extensive experimentation, show that it is more efficient than the state of the art methods. Furthermore, we provide a detailed theoretical analysis of the proposed algorithm, and derive theoretical running times which match closely with the empirically observed wall clock running times, thus explaining the U-shaped behaviour w.r.t. block-sizes.

[1]  Robert A. van de Geijn,et al.  Families of algorithms related to the inversion of a Symmetric Positive Definite matrix , 2008, TOMS.

[2]  Enrique S. Quintana-Ortí,et al.  High Performance Matrix Inversion on a Multi-core Platform with Several GPUs , 2011, 2011 19th International Euromicro Conference on Parallel, Distributed and Network-Based Processing.

[3]  M. Ylinen,et al.  A fixed-point implementation of matrix inversion using Cholesky decomposition , 2003, 2003 46th Midwest Symposium on Circuits and Systems.

[4]  Scott Shenker,et al.  Spark: Cluster Computing with Working Sets , 2010, HotCloud.

[5]  Ameet Talwalkar,et al.  MLlib: Machine Learning in Apache Spark , 2015, J. Mach. Learn. Res..

[6]  Nirwan Ansari,et al.  Spark-based large-scale matrix inversion for big data processing , 2016, 2016 IEEE Conference on Computer Communications Workshops (INFOCOM WKSHPS).

[7]  Chris Mattmann,et al.  SciSpark: Applying in-memory distributed computing to weather event detection and tracking , 2015, 2015 IEEE International Conference on Big Data (Big Data).

[8]  V. Strassen Gaussian elimination is not optimal , 1969 .

[9]  Reynold Xin,et al.  GraphX: Graph Processing in a Distributed Dataflow Framework , 2014, OSDI.

[10]  Ashraf Aboulnaga,et al.  Scalable matrix inversion using MapReduce , 2014, HPDC '14.

[11]  Yubai Li,et al.  A Parallel Method for Matrix Inversion Based on Gauss-jordan Algorithm , 2013 .

[12]  S. Althoen,et al.  Gauss-Jordan reduction: a brief history , 1987 .

[13]  Miad Faezipour,et al.  MapReduce and Spark-Based Analytic Framework Using Social Media Data for Earlier Flu Outbreak Detection , 2017, ICDM.

[14]  Baidurya Bhattacharya,et al.  Technical Note: A fast parallel Gauss Jordan algorithm for matrix inversion using CUDA , 2013 .

[15]  Zhengping Qian,et al.  MadLINQ: large-scale distributed matrix computation for the cloud , 2012, EuroSys '12.

[16]  Emmanuel Agullo,et al.  Towards an Efficient Tile Matrix Inversion of Symmetric Positive Definite Matrices on Multicore Architectures , 2010, VECPAR.

[17]  William H. Press,et al.  Numerical Recipes 3rd Edition: The Art of Scientific Computing , 2007 .

[18]  R. Venkatesh,et al.  Parallel matrix inversion techniques , 1996, Proceedings of 1996 IEEE Second International Conference on Algorithms and Architectures for Parallel Processing, ICA/sup 3/PP '96.