Variable Coded Batch Matrix Multiplication

In this paper, we introduce the Variable Coded Distributed Batch Matrix Multiplication (VCDBMM) problem which tasks a distributed system to perform batch matrix multiplication where matrices are not necessarily distinct among batch jobs. Most coded matrix-matrix computation work has broadly focused in two directions: matrix partitioning for computing a single computation task and batch processing of multiple distinct computation tasks. While these works provide codes with good straggler resilience and fast decoding for their problem spaces, these codes would not be able to take advantage of the natural redundancy of re-using matrices across batch jobs. Inspired by Cross-Subspace Alignment codes, we develop Flexible CrossSubspace Alignments (FCSA) codes that are flexible enough to utilize this redundancy. We provide a full characterization of FCSA codes which allow for a wide variety of system complexities including good straggler resilience and fast decoding. We theoretically demonstrate that, under certain practical conditions, FCSA codes are within a factor of two of the optimal solution when it comes to straggler resilience; our simulations demonstrate that our codes achieve even better optimality gaps in practice.

[1]  Tze Meng Low,et al.  Coded FFT and Its Communication Overhead , 2018, ArXiv.

[2]  Jesús Gómez-Vilardebó,et al.  Bivariate Polynomial Coding for Straggler Exploitation with Heterogeneous Workers , 2020, 2020 IEEE International Symposium on Information Theory (ISIT).

[3]  Dimitris Papailiopoulos,et al.  Permutation-Based SGD: Is Random Optimal? , 2021, ArXiv.

[4]  Mohammad Ali Maddah-Ali,et al.  Coded fourier transform , 2017, 2017 55th Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[5]  Aditya Ramamoorthy,et al.  Numerically stable coded matrix computations via circulant and rotation matrix embeddings , 2019, 2021 IEEE International Symposium on Information Theory (ISIT).

[6]  Ness B. Shroff,et al.  Coded Sparse Matrix Multiplication , 2018, ICML.

[7]  J. Bobadilla,et al.  Recommender systems survey , 2013, Knowl. Based Syst..

[8]  Markus Bläser,et al.  Fast Matrix Multiplication , 2013, Theory Comput..

[9]  V. Strassen Gaussian elimination is not optimal , 1969 .

[10]  Viveck R. Cadambe,et al.  E-Approximate Coded Matrix Multiplication is Nearly Twice as Efficient as Exact Multiplication , 2021, 2021 IEEE International Symposium on Information Theory (ISIT).

[11]  Aditya Ramamoorthy,et al.  Random Convolutional Coding for Robust and Straggler Resilient Distributed Matrix Computation , 2019, ArXiv.

[12]  Yong-Jian Hu,et al.  Displacement structure approach to Cauchy and Cauchy-Vandermonde matrices: inversion formulas and fast algorithms , 2002 .

[13]  Amir Salman Avestimehr,et al.  Coded computation over heterogeneous clusters , 2017, 2017 IEEE International Symposium on Information Theory (ISIT).

[14]  M. Gasca,et al.  Computation of rational interpolants with prescribed poles , 1989 .

[15]  Pulkit Grover,et al.  “Short-Dot”: Computing Large Linear Transforms Distributedly Using Coded Short Dot Products , 2017, IEEE Transactions on Information Theory.

[16]  Kannan Ramchandran,et al.  Straggler-Proofing Massive-Scale Distributed Matrix Multiplication with D-Dimensional Product Codes , 2018, 2018 IEEE International Symposium on Information Theory (ISIT).

[17]  Amir Salman Avestimehr,et al.  Lagrange Coded Computing: Optimal Design for Resiliency, Security and Privacy , 2018, AISTATS.

[18]  Ness B. Shroff,et al.  Fundamental Limits of Coded Linear Transform , 2018, ArXiv.

[19]  Anoosheh Heidarzadeh,et al.  Random Khatri-Rao-Product Codes for Numerically-Stable Distributed Matrix Multiplication , 2019, 2019 57th Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[20]  Ohad Shamir,et al.  How Good is SGD with Random Shuffling? , 2019, COLT.

[21]  Tze Meng Low,et al.  A Unified Coded Deep Neural Network Training Strategy based on Generalized PolyDot codes , 2018, 2018 IEEE International Symposium on Information Theory (ISIT).

[22]  Pulkit Grover,et al.  Coded convolution for parallel and distributed computing within a deadline , 2017, 2017 IEEE International Symposium on Information Theory (ISIT).

[23]  Greg Linden,et al.  Two Decades of Recommender Systems at Amazon.com , 2017, IEEE Internet Computing.

[24]  Amir Salman Avestimehr,et al.  Train Where the Data is: A Case for Bandwidth Efficient Coded Training , 2019, ArXiv.

[25]  Deniz Gündüz,et al.  Coded Computation across Shared Heterogeneous Workers with Communication Delay , 2021, ArXiv.

[26]  M. B. Blake,et al.  Two Decades of Recommender Systems at Amazon.com , 2017 .

[27]  B. Sundar Rajan,et al.  A Computation vs Communication Tradeoff in Distributed Matrix Multiplication Over Finite Fields , 2019, ICC 2019 - 2019 IEEE International Conference on Communications (ICC).

[28]  Syed A. Jafar,et al.  Price of Precision in Coded Distributed Matrix Multiplication: A Dimensional Analysis , 2021, 2021 IEEE Information Theory Workshop (ITW).

[29]  Mohammad Ali Maddah-Ali,et al.  Polynomial Codes: an Optimal Design for High-Dimensional Coded Matrix Multiplication , 2017, NIPS.

[30]  Yehuda Koren,et al.  Matrix Factorization Techniques for Recommender Systems , 2009, Computer.

[31]  Pulkit Grover,et al.  Locally Recoverable Coded Matrix Multiplication , 2018, 2018 56th Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[32]  Tobias Achterberg,et al.  Mixed Integer Programming: Analyzing 12 Years of Progress , 2013 .

[33]  Christodoulos A. Floudas,et al.  Global optimization advances in Mixed-Integer Nonlinear Programming, MINLP, and Constrained Derivative-Free Optimization, CDFO , 2016, Eur. J. Oper. Res..

[34]  Farzin Haddadpour,et al.  Codes for Distributed Finite Alphabet Matrix-Vector Multiplication , 2018, 2018 IEEE International Symposium on Information Theory (ISIT).

[35]  J. Mitchell Branch-and-Cut Algorithms for Combinatorial Optimization Problems , 1988 .

[36]  Kannan Ramchandran,et al.  Speeding Up Distributed Machine Learning Using Codes , 2015, IEEE Transactions on Information Theory.

[37]  Yaoqing Yang,et al.  An Application of Storage-Optimal MatDot Codes for Coded Matrix Multiplication: Fast k-Nearest Neighbors Estimation , 2018, 2018 IEEE International Conference on Big Data (Big Data).

[38]  Deniz Gündüz,et al.  Bivariate Polynomial Coding for Exploiting Stragglers in Heterogeneous Coded Computing Systems , 2020, ArXiv.

[39]  Anindya Bijoy Das,et al.  Straggler-Resistant Distributed Matrix Computation via Coding Theory: Removing a Bottleneck in Large-Scale Data Processing , 2020, IEEE Signal Processing Magazine.

[40]  Jaekyun Moon,et al.  Hierarchical Coding for Distributed Computing , 2018, 2018 IEEE International Symposium on Information Theory (ISIT).

[41]  Viveck R. Cadambe,et al.  Numerically Stable Polynomially Coded Computing , 2019, 2019 IEEE International Symposium on Information Theory (ISIT).

[42]  Syed A. Jafar,et al.  Cross Subspace Alignment Codes for Coded Distributed Batch Computation. , 2019 .

[43]  Mohammad Ali Maddah-Ali,et al.  Straggler Mitigation in Distributed Matrix Multiplication: Fundamental Limits and Optimal Coding , 2018, 2018 IEEE International Symposium on Information Theory (ISIT).

[44]  Qian Yu,et al.  Entangled Polynomial Codes for Secure, Private, and Batch Distributed Matrix Multiplication: Breaking the "Cubic" Barrier , 2020, 2020 IEEE International Symposium on Information Theory (ISIT).

[45]  Mohammad Ali Maddah-Ali,et al.  A Unified Coding Framework for Distributed Computing with Straggling Servers , 2016, 2016 IEEE Globecom Workshops (GC Wkshps).

[46]  Jaekyun Moon,et al.  Coded Matrix Multiplication on a Group-Based Model , 2019, 2019 IEEE International Symposium on Information Theory (ISIT).

[47]  Anindya Bijoy Das,et al.  Distributed Matrix-Vector Multiplication: A Convolutional Coding Approach , 2019, 2019 IEEE International Symposium on Information Theory (ISIT).

[48]  Ivor W. Tsang,et al.  Survey on Multi-Output Learning , 2019, IEEE Transactions on Neural Networks and Learning Systems.

[49]  Nuwan S. Ferdinand,et al.  Hierarchical Coded Computation , 2018, 2018 IEEE International Symposium on Information Theory (ISIT).

[50]  Israel Gohberg,et al.  Fast Algorithms with Preprocessing for Matrix-Vector Multiplication Problems , 1994, J. Complex..

[51]  Farzin Haddadpour,et al.  On the optimal recovery threshold of coded matrix multiplication , 2017, 2017 55th Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[52]  Tong Zhang,et al.  Recommender Systems Using Linear Classifiers , 2002 .

[53]  Malhar Chaudhari,et al.  Rateless codes for near-perfect load balancing in distributed matrix-vector multiplication , 2018, Proc. ACM Meas. Anal. Comput. Syst..

[54]  S. Shalev-Shwartz,et al.  Stochastic Gradient Descent , 2014 .

[55]  Amin Shokrollahi,et al.  A Superfast Algorithm for Confluent Rational Tangential Interpolation Problem via Matrix-vector Multiplication for Confluent Cauchy-like Matrices ∗ , 2000 .

[56]  Luiz André Barroso,et al.  The tail at scale , 2013, CACM.

[57]  Gerhard Reinelt,et al.  Facets of Combinatorial Optimization: Festschrift for Martin Grtschel , 2013 .