A locality-based approach for coded computation

Modern distributed computation infrastructures are often plagued by unavailabilities such as failing or slow servers. These unavailabilities adversely affect the tail latency of computation in distributed infrastructures. The simple solution of replicating computation entails significant resource overhead. Coded computation has emerged as a resource-efficient alternative, wherein multiple units of data are encoded to create parity units and the function to be computed is applied to each of these units on distinct servers. A decoder can use the available function outputs to decode the unavailable ones. Existing coded computation approaches are resource efficient only for simple variants of linear functions such as multilinear, with even the class of low degree polynomials requiring the same multiplicative overhead as replication for practically relevant straggler tolerance. In this paper, we present a new approach to model coded computation via the lens of locality of codes. We introduce a generalized notion of locality, denoted computational locality, building upon the locality of an appropriately defined code. We show that computational locality is equivalent to the required number of workers for coded computation and leverage results from the well-studied locality of codes to design coded computation schemes. We show that recent results on coded computation of multivariate polynomials can be derived using local recovering schemes for Reed-Muller codes. We present coded computation schemes for multivariate polynomials that adaptively exploit locality properties of input data-- an inadmissible technique under existing frameworks. These schemes require fewer workers than the lower bound under existing coded computation frameworks, showing that the existing multiplicative overhead on the number of servers is not fundamental for coded computation of nonlinear functions.

[1]  Ganesh Ananthanarayanan,et al.  Collage Inference: Using Coded Redundancy for Low Variance Distributed Image Classification , 2019 .

[2]  Syed A. Jafar,et al.  Cross Subspace Alignment Codes for Coded Distributed Batch Computation. , 2019 .

[3]  Malhar Chaudhari,et al.  Rateless codes for near-perfect load balancing in distributed matrix-vector multiplication , 2018, Proc. ACM Meas. Anal. Comput. Syst..

[4]  Hongyi Zhang,et al.  mixup: Beyond Empirical Risk Minimization , 2017, ICLR.

[5]  Mohammad Ali Maddah-Ali,et al.  Straggler Mitigation in Distributed Matrix Multiplication: Fundamental Limits and Optimal Coding , 2018, 2018 IEEE International Symposium on Information Theory (ISIT).

[6]  Shubhangi Saraf,et al.  Locally Decodable Codes , 2016, Encyclopedia of Algorithms.

[7]  Dimitris S. Papailiopoulos,et al.  Gradient Coding Using the Stochastic Block Model , 2018, 2018 IEEE International Symposium on Information Theory (ISIT).

[8]  Jaekyun Moon,et al.  Hierarchical Coding for Distributed Computing , 2018, 2018 IEEE International Symposium on Information Theory (ISIT).

[9]  P. Vijay Kumar,et al.  Codes with locality for two erasures , 2014, 2014 IEEE International Symposium on Information Theory.

[10]  Amir Salman Avestimehr,et al.  Near-Optimal Straggler Mitigation for Distributed Gradient Methods , 2017, 2018 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW).

[11]  David E. Muller,et al.  Application of Boolean algebra to switching circuit design and to error detection , 1954, Trans. I R E Prof. Group Electron. Comput..

[12]  Dimitris S. Papailiopoulos,et al.  DRACO: Byzantine-resilient Distributed Training via Redundant Gradients , 2018, ICML.

[13]  Madhu Sudan,et al.  Highly Resilient Correctors for Polynomials , 1992, Inf. Process. Lett..

[14]  Tze Meng Low,et al.  A Unified Coded Deep Neural Network Training Strategy based on Generalized PolyDot codes , 2018, 2018 IEEE International Symposium on Information Theory (ISIT).

[15]  Pulkit Grover,et al.  “Short-Dot”: Computing Large Linear Transforms Distributedly Using Coded Short Dot Products , 2017, IEEE Transactions on Information Theory.

[16]  Dimitris S. Papailiopoulos,et al.  Locally Repairable Codes , 2012, IEEE Transactions on Information Theory.

[17]  John Kubiatowicz,et al.  Erasure Coding Vs. Replication: A Quantitative Comparison , 2002, IPTPS.

[18]  Alexandros G. Dimakis,et al.  Gradient Coding: Avoiding Stragglers in Distributed Learning , 2017, ICML.

[19]  Itzhak Tamo,et al.  A Family of Optimal Locally Recoverable Codes , 2013, IEEE Transactions on Information Theory.

[20]  Richard J. Lipton,et al.  Efficient Checking of Computations , 1990, STACS.

[21]  Alexandros G. Dimakis,et al.  Gradient Coding From Cyclic MDS Codes and Expander Graphs , 2017, IEEE Transactions on Information Theory.

[22]  Suhas N. Diggavi,et al.  Straggler Mitigation in Distributed Optimization Through Data Encoding , 2017, NIPS.

[23]  Dimitris S. Papailiopoulos,et al.  Approximate Gradient Coding via Sparse Random Graphs , 2017, ArXiv.

[24]  Luiz André Barroso,et al.  The tail at scale , 2013, CACM.

[25]  Dimitris S. Papailiopoulos,et al.  ErasureHead: Distributed Gradient Descent without Delays Using Approximate Gradient Coding , 2019, ArXiv.

[26]  Babak Hassibi,et al.  Improving Distributed Gradient Descent Using Reed-Solomon Codes , 2017, 2018 IEEE International Symposium on Information Theory (ISIT).

[27]  Amir Salman Avestimehr,et al.  Collage Inference: Tolerating Stragglers in Distributed Neural Network Inference using Coding , 2019, ArXiv.

[28]  Ronitt Rubinfeld,et al.  Self-testing/correcting for polynomials and for approximate functions , 1991, STOC '91.

[29]  Pulkit Grover,et al.  Locally Recoverable Coded Matrix Multiplication , 2018, 2018 56th Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[30]  Ness B. Shroff,et al.  Coded Sparse Matrix Multiplication , 2018, ICML.

[31]  Pulkit Grover,et al.  Coded convolution for parallel and distributed computing within a deadline , 2017, 2017 IEEE International Symposium on Information Theory (ISIT).

[32]  Amir Salman Avestimehr,et al.  Train Where the Data is: A Case for Bandwidth Efficient Coded Training , 2019, ArXiv.

[33]  Mary Wootters,et al.  On Taking Advantage of Multiple Requests in Error Correcting Codes , 2018, 2018 IEEE International Symposium on Information Theory (ISIT).

[34]  Amir Salman Avestimehr,et al.  Timely-Throughput Optimal Coded Computing over Cloud Networks , 2019, MobiHoc.

[35]  Qian Yu,et al.  Entangled Polynomial Codes for Secure, Private, and Batch Distributed Matrix Multiplication: Breaking the "Cubic" Barrier , 2020, 2020 IEEE International Symposium on Information Theory (ISIT).

[36]  Nuwan S. Ferdinand,et al.  Hierarchical Coded Computation , 2018, 2018 IEEE International Symposium on Information Theory (ISIT).

[37]  Shivaram Venkataraman,et al.  Learning a Code: Machine Learning for Approximate Non-Linear Coded Computation , 2018, ArXiv.

[38]  Min Ye,et al.  Communication-Computation Efficient Gradient Coding , 2018, ICML.

[39]  Cheng Huang,et al.  On the Locality of Codeword Symbols , 2011, IEEE Transactions on Information Theory.

[40]  Amir Salman Avestimehr,et al.  Lagrange Coded Computing: Optimal Design for Resiliency, Security and Privacy , 2018, AISTATS.

[41]  Amir Salman Avestimehr,et al.  Slack squeeze coded computing for adaptive straggler mitigation , 2019, SC.

[42]  Irving S. Reed,et al.  A class of multiple-error-correcting codes and the decoding scheme , 1954, Trans. IRE Prof. Group Inf. Theory.

[43]  Mohammad Ali Maddah-Ali,et al.  Polynomial Codes: an Optimal Design for High-Dimensional Coded Matrix Multiplication , 2017, NIPS.

[44]  Kannan Ramchandran,et al.  Straggler-Proofing Massive-Scale Distributed Matrix Multiplication with D-Dimensional Product Codes , 2018, 2018 IEEE International Symposium on Information Theory (ISIT).

[45]  Kannan Ramchandran,et al.  High-dimensional coded matrix multiplication , 2017, 2017 IEEE International Symposium on Information Theory (ISIT).

[46]  Kannan Ramchandran,et al.  Speeding Up Distributed Machine Learning Using Codes , 2015, IEEE Transactions on Information Theory.

[47]  Nuwan S. Ferdinand,et al.  Exploitation of Stragglers in Coded Computation , 2018, 2018 IEEE International Symposium on Information Theory (ISIT).

[48]  Shivaram Venkataraman,et al.  Parity models: erasure-coded resilience for prediction serving systems , 2019, SOSP.

[49]  Farzin Haddadpour,et al.  On the optimal recovery threshold of coded matrix multiplication , 2017, 2017 55th Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[50]  Joan Feigenbaum,et al.  Hiding Instances in Multioracle Queries , 1990, STACS.

[51]  Mohammad Ali Maddah-Ali,et al.  A Unified Coding Framework for Distributed Computing with Straggling Servers , 2016, 2016 IEEE Globecom Workshops (GC Wkshps).

[52]  Amir Salman Avestimehr,et al.  Coded computation over heterogeneous clusters , 2017, 2017 IEEE International Symposium on Information Theory (ISIT).