Numerically Stable Polynomially Coded Computing

We consider the issue of numerical stability in solving the problem of coded large scale matrix multiplication in distributed systems where worker nodes are prone to failures/delays. We construct new codes that achieve comparable fault tolerance as previous codes, but are more numerically stable. Unlike previous codes that use polynomials expanded in a monomial basis, our codes use polynomials expressed in a basis of orthonormal polynomials. We show via new theoretical results on the condition number, as well as numerical experiments, that the application of these codes can lead to significantly more numerically stable computation than the current monomial-basis codes.

[1]  Mohammad Ali Maddah-Ali,et al.  Polynomial Codes: an Optimal Design for High-Dimensional Coded Matrix Multiplication , 2017, NIPS.

[2]  Soummya Kar,et al.  Coding for a Single Sparse Inverse Problem , 2018, 2018 IEEE International Symposium on Information Theory (ISIT).

[3]  Li Tang,et al.  Universally Decodable Matrices for Distributed Matrix-Vector Multiplication , 2019, 2019 IEEE International Symposium on Information Theory (ISIT).

[4]  Min Ye,et al.  Communication-Computation Efficient Gradient Coding , 2018, ICML.

[5]  Alexandros G. Dimakis,et al.  Gradient Coding , 2016, ArXiv.

[6]  Kannan Ramchandran,et al.  High-dimensional coded matrix multiplication , 2017, 2017 IEEE International Symposium on Information Theory (ISIT).

[7]  Dustin G. Mixon,et al.  Numerically erasure-robust frames , 2012, 1202.4525.

[8]  Mohammad Ali Maddah-Ali,et al.  Straggler Mitigation in Distributed Matrix Multiplication: Fundamental Limits and Optimal Coding , 2018, 2018 IEEE International Symposium on Information Theory (ISIT).

[9]  Philipp Birken,et al.  Numerical Linear Algebra , 2011, Encyclopedia of Parallel Computing.

[10]  Farzin Haddadpour,et al.  On the optimal recovery threshold of coded matrix multiplication , 2017, 2017 55th Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[11]  Lothar Reichel,et al.  Chebyshev-Vandermonde systems , 1991 .

[12]  Tze Meng Low,et al.  A Unified Coded Deep Neural Network Training Strategy based on Generalized PolyDot codes , 2018, 2018 IEEE International Symposium on Information Theory (ISIT).

[13]  高等学校計算数学学報編輯委員会編,et al.  高等学校計算数学学報 = Numerical mathematics , 1979 .

[14]  Å. Björck,et al.  Solution of Vandermonde Systems of Equations , 1970 .

[15]  Kannan Ramchandran,et al.  Speeding Up Distributed Machine Learning Using Codes , 2015, IEEE Transactions on Information Theory.

[16]  Jean-Marc Azaïs,et al.  Upper and Lower Bounds for the Tails of the Distribution of the Condition Number of a Gaussian Matrix , 2005, SIAM J. Matrix Anal. Appl..

[17]  Amir Salman Avestimehr,et al.  Lagrange Coded Computing: Optimal Design for Resiliency, Security and Privacy , 2018, AISTATS.

[18]  Pulkit Grover,et al.  “Short-Dot”: Computing Large Linear Transforms Distributedly Using Coded Short Dot Products , 2017, IEEE Transactions on Information Theory.

[19]  Ashwin Ganesan,et al.  On the Existence of Universally Decodable Matrices , 2006, IEEE Transactions on Information Theory.

[20]  Jacob A. Abraham,et al.  Algorithm-Based Fault Tolerance for Matrix Operations , 1984, IEEE Transactions on Computers.

[21]  W. Gautschi,et al.  Lower bounds for the condition number of Vandermonde matrices , 1987 .

[22]  Yaoqing Yang,et al.  Straggler-Resilient and Communication-Efficient Distributed Iterative Linear Solver , 2018, ArXiv.

[23]  Walter Gautschi,et al.  Norm estimates for inverses of Vandermonde matrices , 1974 .

[24]  Arya Mazumdar,et al.  Robust Gradient Descent via Moment Encoding with LDPC Codes , 2018, ArXiv.

[25]  Yaoqing Yang,et al.  An Application of Storage-Optimal MatDot Codes for Coded Matrix Multiplication: Fast k-Nearest Neighbors Estimation , 2018, 2018 IEEE International Conference on Big Data (Big Data).

[26]  James Demmel,et al.  The Accurate and Efficient Solution of a Totally Positive Generalized Vandermonde Linear System , 2005, SIAM J. Matrix Anal. Appl..

[27]  Amir Salman Avestimehr,et al.  Polynomially Coded Regression: Optimal Straggler Mitigation via Data Encoding , 2018, ArXiv.

[28]  Walter Gautschi,et al.  How (Un)stable Are Vandermonde Systems? , 2020, Asymptotic and Computational Analysis.

[29]  Alexandros G. Dimakis,et al.  Gradient Coding: Avoiding Stragglers in Distributed Learning , 2017, ICML.

[30]  Yaoqing Yang,et al.  Cross-Iteration Coded Computing , 2018, 2018 56th Annual Allerton Conference on Communication, Control, and Computing (Allerton).