Core Failure Mitigation in Integer Sum-of-Product Computations on Cloud Computing Systems

The decreasing mean-time-to-failure estimates in cloud computing systems indicate that multimedia applications running on such environments should be able to mitigate an increasing number of core failures at runtime. We propose a new roll-forward failure-mitigation approach for integer sum-of-product computations, with emphasis on generic matrix multiplication (GEMM) and convolution/crosscorrelation (CONV) routines. Our approach is based on the production of redundant results within the numerical representation of the outputs via the use of numerical packing. This differs from all existing roll-forward solutions that require a separate set of checksum (or duplicate) results. Our proposal imposes 37.5% reduction in the maximum output bitwidth supported in comparison to integer sum-of-product realizations performed on 32-bit integer representations which is comparable to the bitwidth requirement of checksum-methods for multiple core failure mitigation. Experiments with state-of-the-art GEMM and CONV routines running on a c4.8xlarge compute-optimized instance of amazon web services elastic compute cloud (AWS EC2) demonstrate that the proposed approach is able to mitigate up to one quadcore failure while achieving processing throughput that is: 1) comparable to that of the conventional, failure-intolerant, integer GEMM and CONV routines, 2) substantially superior to that of the equivalent roll-forward failure-mitigation method based on checksum streams. Furthermore, when used within an image retrieval framework deployed over a cluster of AWS EC2 spot (i.e., low-cost albeit terminatable) instances, our proposal leads to: 1) 16%-23% cost reduction against the equivalent checksum-based method and 2) more than 70% cost reduction against conventional failure-intolerant processing on AWS EC2 on-demand (i.e., higher-cost albeit guaranteed) instances.

[1]  Konstantinos G. Margaritis,et al.  Algorithm Based Fault Tolerance : Review and experimental study , 2004 .

[2]  Claudio Gennaro,et al.  Large Scale Image Retrieval Using Vectors of Locally Aggregated Descriptors , 2013, ERCIM News.

[3]  Zizhong Chen Algorithm-based recovery for iterative methods without checkpointing , 2011, HPDC '11.

[4]  Niraj K. Jha,et al.  Algorithm-based fault tolerance for floating-point operations in massively parallel systems , 1992, [Proceedings] 1992 IEEE International Symposium on Circuits and Systems.

[5]  Alexander Kadyrov,et al.  The "Invaders' Algorithm: Range of Values Modulation for Accelerated Correlation , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[6]  Chong Luo,et al.  Multimedia Cloud Computing , 2011, IEEE Signal Processing Magazine.

[7]  Yiannis Andreopoulos,et al.  Linear Image Processing Operations With Operational Tight Packing , 2010, IEEE Signal Processing Letters.

[8]  Shreyas Sundaram,et al.  Fault-Tolerant Convolution Via Chinese Remainder Codes Constructed From Non-Coprime Moduli , 2008, IEEE Transactions on Signal Processing.

[9]  Steven C. H. Hoi,et al.  Fast Object Retrieval Using Direct Spatial Matching , 2015, IEEE Transactions on Multimedia.

[10]  E. Stewart,et al.  Intel Integrated Performance Primitives: How to Optimize Software Applications Using Intel IPP , 2004 .

[11]  David Fiala Detection and correction of silent data corruption for large-scale high-performance computing , 2012, 2012 International Conference for High Performance Computing, Networking, Storage and Analysis.

[12]  Thomas Hérault,et al.  Algorithm-based fault tolerance for dense matrix factorizations , 2012, PPoPP '12.

[13]  Thanos Stouraitis,et al.  A local wavelet transform implementation versus an optimal row-column algorithm for the 2D multilevel decomposition , 2001, Proceedings 2001 International Conference on Image Processing (Cat. No.01CH37205).

[14]  Thierry Bertin-Mahieux,et al.  The Million Song Dataset , 2011, ISMIR.

[15]  Ioannis Patras,et al.  Incremental Refinement of Image Salient-Point Detection , 2008, IEEE Transactions on Image Processing.

[16]  Jacob A. Abraham,et al.  Fault-Tolerant Matrix Operations On Multiple Processor Systems Using Weighted Checksums , 1984, Optics & Photonics.

[17]  Abhishek Verma,et al.  Large-scale cluster management at Google with Borg , 2015, EuroSys.

[18]  Yervant Zorian,et al.  Design for test and reliability in ultimate CMOS , 2012, 2012 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[19]  Tao Mei,et al.  Super Fast Event Recognition in Internet Videos , 2015, IEEE Transactions on Multimedia.

[20]  Tajana Simunic,et al.  Correcting vibration-induced performance degradation in enterprise servers , 2014, PERV.

[21]  J.A. Abraham,et al.  Fault-tolerant matrix arithmetic and signal processing on highly concurrent computing structures , 1986, Proceedings of the IEEE.

[22]  Mihaela van der Schaar,et al.  Statistical Framework for Video Decoding Complexity Modeling and Prediction , 2009, IEEE Transactions on Circuits and Systems for Video Technology.

[23]  Michael Treaster,et al.  A Survey of Fault-Tolerance and Fault-Recovery Techniques in Parallel Systems , 2004, ArXiv.

[24]  Rajeev Motwani,et al.  The PageRank Citation Ranking : Bringing Order to the Web , 1999, WWW 1999.

[25]  Yiannis Andreopoulos,et al.  Throughput-Distortion Computation of Generic Matrix Multiplication: Toward a Computation Channel for Digital Signal Processing Systems , 2011, IEEE Transactions on Signal Processing.

[26]  J. Cornelis,et al.  A new method for complete-to-overcomplete discrete wavelet transforms , 2002, 2002 14th International Conference on Digital Signal Processing Proceedings. DSP 2002 (Cat. No.02TH8628).

[28]  Bran Selic,et al.  A survey of fault tolerance mechanisms and checkpoint/restart implementations for high performance computing systems , 2013, The Journal of Supercomputing.

[29]  Endong Wang,et al.  Intel Math Kernel Library , 2014 .

[30]  Alejandro F. Frangi,et al.  Two-dimensional PCA: a new approach to appearance-based face representation and recognition , 2004 .

[31]  George Bosilca,et al.  Open MPI: Goals, Concept, and Design of a Next Generation MPI Implementation , 2004, PVM/MPI.

[32]  Fabio Verdicchio,et al.  Highly-reliable integer matrix multiplication via numerical packing , 2013, 2013 IEEE 19th International On-Line Testing Symposium (IOLTS).

[33]  C. Loan The ubiquitous Kronecker product , 2000 .

[34]  Yiannis Andreopoulos,et al.  Software Designs of Image Processing Tasks With Incremental Refinement of Computation , 2010, IEEE Transactions on Image Processing.

[35]  Yonggang Wen,et al.  Cloud Mobile Media: Reflections and Outlook , 2014, IEEE Transactions on Multimedia.

[36]  Christian Engelmann,et al.  The Case for Modular Redundancy in Large-Scale High Performance Computing Systems , 2009 .

[37]  loannis Andreopoulos,et al.  A hybrid image compression algorithm based on fractal coding and wavelet transform , 2000, 2000 IEEE International Symposium on Circuits and Systems. Emerging Technologies for the 21st Century. Proceedings (IEEE Cat No.00CH36353).

[38]  Sarah Ellen Michalak,et al.  Application MTTFE vs. Platform MTBF: A Fresh Perspective on System Reliability and Application Throughput for Computations at Scale , 2008, 2008 Eighth IEEE International Symposium on Cluster Computing and the Grid (CCGRID).

[39]  Yiannis Andreopoulos Error Tolerant Multimedia Stream Processing: There's Plenty of Room at the Top (of the System Stack) , 2013, IEEE Transactions on Multimedia.

[40]  Yiannis Andreopoulos,et al.  Throughput Scaling Of Convolution For Error-Tolerant Multimedia Applications , 2012, IEEE Transactions on Multimedia.

[41]  Daniel P. W. Ellis,et al.  Cross-correlation of beat-synchronous representations for music similarity , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[42]  Franklin T. Luk Algorithm-based Fault Tolerance for Parallel Matrix Equation Solvers , 1986, Optics & Photonics.

[43]  Franck Cappello,et al.  An Efficient Silent Data Corruption Detection Method with Error-Feedback Control and Even Sampling for HPC Applications , 2015, 2015 15th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing.

[44]  Ben Carterette,et al.  Million Query Track 2007 Overview , 2008, TREC.

[45]  Cordelia Schmid,et al.  Aggregating Local Image Descriptors into Compact Codes , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[46]  Chao Wang,et al.  A Job Pause Service under LAM/MPI+BLCR for Transparent Fault Tolerance , 2007, 2007 IEEE International Parallel and Distributed Processing Symposium.

[47]  Jack Dongarra,et al.  Extending the MPI Specification for Process Fault Tolerance on High Performance Computing Systems , 2004 .

[48]  Mihaela van der Schaar,et al.  Adaptive Linear Prediction for Resource Estimation of Video Decoding , 2007, IEEE Transactions on Circuits and Systems for Video Technology.

[49]  Klaus Seyerlehner FUSING BLOCK-LEVEL FEATURES FOR MUSIC SIMILARITY ESTIMATION , 2010 .

[50]  Zizhong Chen,et al.  Algorithm-Based Fault Tolerance for Fail-Stop Failures , 2008, IEEE Transactions on Parallel and Distributed Systems.

[51]  Hamid Laga,et al.  Covariance-Based Descriptors for Efficient 3D Shape Matching, Retrieval, and Classification , 2015, IEEE Transactions on Multimedia.

[52]  Philip S. Yu,et al.  Incremental tensor analysis: Theory and applications , 2008, TKDD.

[53]  Cordelia Schmid,et al.  Hamming Embedding and Weak Geometric Consistency for Large Scale Image Search , 2008, ECCV.

[54]  Daniel Marques,et al.  Automated application-level checkpointing of MPI programs , 2003, PPoPP '03.

[55]  Yannis Avrithis,et al.  To Aggregate or Not to aggregate: Selective Match Kernels for Image Search , 2013, 2013 IEEE International Conference on Computer Vision.

[56]  Zizhong Chen,et al.  Optimal real number codes for fault tolerant matrix operations , 2009, Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis.

[57]  Bruce R. Musicus,et al.  Fast fault-tolerant digital convolution using a polynomial residue number system , 1993, IEEE Trans. Signal Process..

[58]  Gauthier Lafruit,et al.  High-Level Cache Modeling for 2-D Discrete Wavelet Transform Implementations , 2003, J. VLSI Signal Process..

[59]  Mihaela van der Schaar,et al.  Control of the distortion variation in video coding systems based on motion compensated temporal filtering , 2003, Proceedings 2003 International Conference on Image Processing (Cat. No.03CH37429).

[60]  Bingsheng He,et al.  Monetary cost optimizations for MPI-based HPC applications on Amazon clouds: checkpoints and replicated execution , 2015, SC15: International Conference for High Performance Computing, Networking, Storage and Analysis.

[61]  Jacob A. Abraham,et al.  Algorithm-Based Fault Tolerance for Matrix Operations , 1984, IEEE Transactions on Computers.

[62]  Robert A. van de Geijn,et al.  Anatomy of high-performance matrix multiplication , 2008, TOMS.

[63]  Rakesh Kumar,et al.  Algorithmic approaches to low overhead fault detection for sparse linear algebra , 2012, IEEE/IFIP International Conference on Dependable Systems and Networks (DSN 2012).

[64]  Takahiro Katagiri,et al.  Parallel Processing of Matrix Multiplication in a CPU and GPU Heterogeneous Environment , 2006, VECPAR.

[65]  Cordelia Schmid,et al.  Aggregating local descriptors into a compact image representation , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[66]  Yiannis Andreopoulos,et al.  Precision–Energy–Throughput Scaling of Generic Matrix Multiplication and Convolution Kernels via Linear Projections , 2014, IEEE Transactions on Circuits and Systems for Video Technology.