Fault-tolerant distributed logistic regression using unreliable components

We consider the problem of computing distributed logistic regression using unreliable components. We consider both faults in the memory units and faults in the processing units. We show that using a real-number-coding technique, we can suppress errors during the computation and ensure that logistic regression converges with bounded error if the number of faults that happen during each iteration of the logistic regression is bounded, even when the faults happen in an adversarial manner. Moreover, since the coding technique is based on computation with real numbers, we show that the error-correction can be carried out at the algorithmic level (or block-level) based on the results from intermediate steps of logistic regression. Therefore, we only need to add redundant hardware at block-level, not the circuit level, for achieving fault-tolerance in the computation of logistic regression.

[1]  Chris Fallin,et al.  Flipping bits in memory without accessing them: An experimental study of DRAM disturbance errors , 2014, 2014 ACM/IEEE 41st International Symposium on Computer Architecture (ISCA).

[2]  Lara Dolecek,et al.  Belief Propagation Algorithms on Noisy Hardware , 2015, IEEE Transactions on Communications.

[3]  Rajat Raina,et al.  Large-scale deep unsupervised learning using graphics processors , 2009, ICML '09.

[4]  Shu Lin,et al.  Near-Shannon-limit quasi-cyclic low-density parity-check codes , 2003, IEEE Transactions on Communications.

[5]  B. Vasic,et al.  Fault Tolerant Memories Based on Expander Graphs , 2007, 2007 IEEE Information Theory Workshop.

[6]  Kang G. Shin,et al.  Real-time dynamic voltage scaling for low-power embedded operating systems , 2001, SOSP.

[7]  Lav R. Varshney,et al.  Performance of LDPC Codes Under Faulty Iterative Decoding , 2008, IEEE Transactions on Information Theory.

[8]  Vijay S. Pande,et al.  Accelerating molecular dynamic simulation on graphics processing units , 2009, J. Comput. Chem..

[9]  Dariush Divsalar,et al.  Capacity-approaching protograph codes , 2009, IEEE Journal on Selected Areas in Communications.

[10]  Fan Zhang,et al.  Verification Decoding of High-Rate LDPC Codes With Applications in Compressed Sensing , 2009, IEEE Transactions on Information Theory.

[11]  Soummya Kar,et al.  Computing linear transforms with unreliable components , 2016, 2016 IEEE International Symposium on Information Theory (ISIT).

[12]  Nicholas Pippenger,et al.  On networks of noisy gates , 1985, 26th Annual Symposium on Foundations of Computer Science (sfcs 1985).

[13]  Michael G. Taylor Reliable computation in computing systems designed from unreliable components , 1968 .

[14]  Kevin Skadron,et al.  A hardware redundancy and recovery mechanism for reliable scientific computation on graphics processors , 2007, GH '07.

[15]  Daniel A. Spielman,et al.  Efficient erasure correcting codes , 2001, IEEE Trans. Inf. Theory.

[16]  Mayank Bakshi,et al.  SHO-FA: Robust compressive sensing with order-optimal complexity, measurements, and bits , 2012, 2012 50th Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[17]  Satoshi Matsuoka,et al.  Software-Based ECC for GPUs , 2011 .

[18]  Christoforos N. Hadjicostis,et al.  Coding approaches to fault tolerance in linear dynamic systems , 2005, IEEE Transactions on Information Theory.

[19]  Ismail Nikoufar,et al.  Mathematical Analysis II , 2016 .

[20]  Kannan Ramchandran,et al.  Speeding Up Distributed Machine Learning Using Codes , 2015, IEEE Transactions on Information Theory.

[21]  Lara Dolecek,et al.  Gallager B Decoder on Noisy Hardware , 2013, IEEE Transactions on Communications.

[22]  Richard G. Baraniuk,et al.  Sudocodes ߝ Fast Measurement and Reconstruction of Sparse Signals , 2006, 2006 IEEE International Symposium on Information Theory.

[23]  Kannan Ramchandran,et al.  Sub-linear time compressed sensing using sparse-graph codes , 2015, 2015 IEEE International Symposium on Information Theory (ISIT).

[24]  Shu Lin,et al.  Near-Shannon-limit quasi-cyclic low-density parity-check codes , 2004, IEEE Trans. Commun..

[25]  Soummya Kar,et al.  Computing Linear Transformations With Unreliable Components , 2015, IEEE Transactions on Information Theory.

[26]  Michael G. Taylor Reliable information storage in memories designed from unreliable components , 1968 .

[27]  B. Ripley,et al.  Pattern Recognition , 1968, Nature.

[28]  Soummya Kar,et al.  Fault-tolerant parallel linear filtering using compressive sensing , 2016, 2016 9th International Symposium on Turbo Codes and Iterative Information Processing (ISTC).

[29]  Yunnan Wu,et al.  A Survey on Network Codes for Distributed Storage , 2010, Proceedings of the IEEE.

[30]  Vijay S. Pande,et al.  Hard Data on Soft Errors: A Large-Scale Assessment of Real-World Error Rates in GPGPU , 2009, 2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing.

[31]  Weiyu Xu,et al.  Efficient Compressive Sensing with Deterministic Guarantees Using Expander Graphs , 2007, 2007 IEEE Information Theory Workshop.