Flexible Constructions for Distributed Matrix Multiplication

The distributed matrix multiplication problem with unknown number of stragglers is considered, where the goal is to allow a master to efficiently and flexibly obtain the product of two massive matrices by distributing the computation across $N$ servers. We assume there are at most $N-R$ stragglers but the exact number is not known a priori. Motivated by reducing the latency, a flexible solution is proposed to fully utilize the computation capability of available servers. The computing job for each server is separated into 2 layers, constructed based on Entangled Polynomial (EP) codes by Yu el al. The final results can be obtained when a larger number of servers complete the task from the first layer or a smaller number of servers complete the tasks from both 2 layers. The required finite field size of the proposed solution is less than $2N$. Moreover, the optimal partitioning of the input matrices is discussed. Our constructions can also be generalized to batch matrix multiplication.

[1]  Jaekyun Moon,et al.  Hierarchical Coding for Distributed Computing , 2018, 2018 IEEE International Symposium on Information Theory (ISIT).

[2]  Aydin Sezgin,et al.  On the Capacity and Straggler-Robustness of Distributed Secure Matrix Multiplication , 2019, IEEE Access.

[3]  Mohammad Ali Maddah-Ali,et al.  Coded fourier transform , 2017, 2017 55th Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[4]  Farzin Haddadpour,et al.  On the optimal recovery threshold of coded matrix multiplication , 2017, 2017 55th Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[5]  Kannan Ramchandran,et al.  High-dimensional coded matrix multiplication , 2017, 2017 IEEE International Symposium on Information Theory (ISIT).

[6]  David A. Karpuk,et al.  GASP Codes for Secure Distributed Matrix Multiplication , 2018, 2019 IEEE International Symposium on Information Theory (ISIT).

[7]  Nuwan S. Ferdinand,et al.  Exploitation of Stragglers in Coded Computation , 2018, 2018 IEEE International Symposium on Information Theory (ISIT).

[8]  Deniz Gündüz,et al.  Bivariate Hermitian Polynomial Coding for Efficient Distributed Matrix Multiplication , 2020, GLOBECOM 2020 - 2020 IEEE Global Communications Conference.

[9]  Amir Salman Avestimehr,et al.  Lagrange Coded Computing: Optimal Design for Resiliency, Security and Privacy , 2018, AISTATS.

[10]  Amir Salman Avestimehr,et al.  Coded computation over heterogeneous clusters , 2017, 2017 IEEE International Symposium on Information Theory (ISIT).

[11]  Mohammad Ali Maddah-Ali,et al.  Straggler Mitigation in Distributed Matrix Multiplication: Fundamental Limits and Optimal Coding , 2018, 2018 IEEE International Symposium on Information Theory (ISIT).

[12]  Jaekyun Moon,et al.  Coded Matrix Multiplication on a Group-Based Model , 2019, 2019 IEEE International Symposium on Information Theory (ISIT).

[13]  Mohammad Ali Maddah-Ali,et al.  Coding for Distributed Fog Computing , 2017, IEEE Communications Magazine.

[14]  Igor S. Sergeev,et al.  Complexity of computation in finite fields , 2013, Journal of Mathematical Sciences.

[15]  Pulkit Grover,et al.  Coded convolution for parallel and distributed computing within a deadline , 2017, 2017 IEEE International Symposium on Information Theory (ISIT).

[16]  Dimitris S. Papailiopoulos,et al.  XORing Elephants: Novel Erasure Codes for Big Data , 2013, Proc. VLDB Endow..

[17]  Mohammad Ali Maddah-Ali,et al.  CodedSketch: A Coding Scheme for Distributed Computation of Approximated Matrix Multiplication , 2018, IEEE Transactions on Information Theory.

[18]  Jungwoo Lee,et al.  Private Secure Coded Computation , 2019, 2019 IEEE International Symposium on Information Theory (ISIT).

[19]  Mohammad Ali Maddah-Ali,et al.  Polynomial Codes: an Optimal Design for High-Dimensional Coded Matrix Multiplication , 2017, NIPS.

[20]  Albin Severinson,et al.  Block-Diagonal and LT Codes for Distributed Computing With Straggling Servers , 2017, IEEE Transactions on Communications.

[21]  Kangwook Lee,et al.  Matrix sparsification for coded matrix multiplication , 2017, 2017 55th Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[22]  Jörg Kliewer,et al.  Distributed and Private Coded Matrix Computation with Flexible Communication Load , 2019, 2019 IEEE International Symposium on Information Theory (ISIT).

[23]  Parimal Parag,et al.  Minimizing Latency for Secure Coded Computing Using Secret Sharing via Staircase Codes , 2018, IEEE Transactions on Communications.

[24]  Pulkit Grover,et al.  “Short-Dot”: Computing Large Linear Transforms Distributedly Using Coded Short Dot Products , 2017, IEEE Transactions on Information Theory.

[25]  Malhar Chaudhari,et al.  Rateless codes for near-perfect load balancing in distributed matrix-vector multiplication , 2018, Proc. ACM Meas. Anal. Comput. Syst..

[26]  Tze Meng Low,et al.  A Unified Coded Deep Neural Network Training Strategy based on Generalized PolyDot codes , 2018, 2018 IEEE International Symposium on Information Theory (ISIT).

[27]  Deniz Gündüz,et al.  Computation Scheduling for Distributed Machine Learning with Straggling Workers , 2018, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[28]  Syed A. Jafar,et al.  Cross Subspace Alignment Codes for Coded Distributed Batch Computation. , 2019 .

[29]  Syed A. Jafar,et al.  GCSA Codes with Noise Alignment for Secure Coded Multi-Party Batch Matrix Multiplication , 2020, 2020 IEEE International Symposium on Information Theory (ISIT).

[30]  Emre Ozfatura,et al.  Straggler-Aware Distributed Learning: Communication–Computation Latency Trade-Off , 2020, Entropy.

[31]  Ness B. Shroff,et al.  Coded Sparse Matrix Multiplication , 2018, ICML.

[32]  Venkat Dasari,et al.  Private and rateless adaptive coded matrix-vector multiplication , 2019, EURASIP Journal on Wireless Communications and Networking.

[33]  Rawad Bitar,et al.  Adaptive Private Distributed Matrix Multiplication , 2021, IEEE Transactions on Information Theory.

[34]  Farzin Haddadpour,et al.  Codes for Distributed Finite Alphabet Matrix-Vector Multiplication , 2018, 2018 IEEE International Symposium on Information Theory (ISIT).

[35]  Kannan Ramchandran,et al.  Speeding Up Distributed Machine Learning Using Codes , 2015, IEEE Transactions on Information Theory.

[36]  Yaoqing Yang,et al.  An Application of Storage-Optimal MatDot Codes for Coded Matrix Multiplication: Fast k-Nearest Neighbors Estimation , 2018, 2018 IEEE International Conference on Big Data (Big Data).

[37]  Ravi Tandon,et al.  On the Capacity of Secure Distributed Matrix Multiplication , 2018, 2018 IEEE Global Communications Conference (GLOBECOM).

[38]  Pulkit Grover,et al.  Locally Recoverable Coded Matrix Multiplication , 2018, 2018 56th Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[39]  Deniz Gündüz,et al.  Bivariate Polynomial Coding for Exploiting Stragglers in Heterogeneous Coded Computing Systems , 2020, ArXiv.

[40]  Ness B. Shroff,et al.  Fundamental Limits of Coded Linear Transform , 2018, ArXiv.