What broke where for distributed and parallel applications---a whodunit story

Mitra, Subrata PhD, Purdue University, December 2016. What Broke Where for Distributed and Parallel Applications — a Whodunit Story. Major Professor: Saurabh Bagchi. Detection, diagnosis and mitigation of performance problems in today’s large-scale distributed and parallel systems is a difficult task. These large distributed and parallel systems are composed of various complex software and hardware components. When the system experiences some performance or correctness problem, developers struggle to understand the root cause of the problem and fix in a timely manner. In my thesis, I address these three components of the performance problems in computer systems. First, we focus on diagnosing performance problems in large-scale parallel applications running on supercomputers. We developed techniques to localize the performance problem for root-cause analysis. Parallel applications, most of which are complex scientific simulations running in supercomputers, can create up to millions of parallel tasks that run on different machines and communicate using the message passing paradigm. We developed a highly scalable and accurate automated debugging tool called PRODOMETER, which uses sophisticated algorithms to first, create a logical progress dependency graph of the tasks to highlight how the problem spread through the system manifesting as a system-wide performance issue. Second, uses this logical progress dependence graph to identify the task where the problem originated. Finally, PRODOMETER pinpoints the code region corresponding to the origin of the bug. Second, we developed a tool-chain that can detect performance anomaly using machine-learning techniques and can achieve very low false positive rate. Our inputaware performance anomaly detection system consists of a scalable data collection framework to collect performance related metrics from different granularity of code

[1]  J. R. Quinlan Learning With Continuous Classes , 1992 .

[2]  David M. Brooks,et al.  Accurate and efficient regression modeling for microarchitectural performance and power prediction , 2006, ASPLOS XII.

[3]  Nihar B. Shah,et al.  Optimal Exact-Regenerating Codes for Distributed Storage at the MSR and MBR Points via a Product-Matrix Construction , 2010, IEEE Transactions on Information Theory.

[4]  Feng Mao,et al.  Exploiting statistical correlations for proactive prediction of program behaviors , 2010, CGO '10.

[5]  Rodrigo Rodrigues,et al.  High Availability in DHTs: Erasure Coding vs. Replication , 2005, IPTPS.

[6]  Kannan Ramchandran,et al.  A "hitchhiker's" guide to fast and efficient data reconstruction in erasure-coded data centers , 2015, SIGCOMM 2015.

[7]  Saurabh Bagchi,et al.  Dealing with the Unknown: Resilience to Prediction Errors , 2015, 2015 International Conference on Parallel Architecture and Compilation (PACT).

[8]  Dan Grossman,et al.  Monitoring and Debugging the Quality of Results in Approximate Programs , 2015, ASPLOS.

[9]  Dimitris S. Papailiopoulos,et al.  XORing Elephants: Novel Erasure Codes for Big Data , 2013, Proc. VLDB Endow..

[10]  Martin Schulz,et al.  Large scale debugging of parallel tasks with AutomaDeD , 2011, 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC).

[11]  Dhabaleswar K. Panda,et al.  DMTracker: finding bugs in large-scale parallel programs by detecting anomaly in data movements , 2007, Proceedings of the 2007 ACM/IEEE Conference on Supercomputing (SC '07).

[12]  Christof Fetzer,et al.  IncApprox: A Data Analytics System for Incremental Approximate Computing , 2016, WWW.

[13]  David F. Bacon,et al.  Compiler transformations for high-performance computing , 1994, CSUR.

[14]  Kannan Ramchandran,et al.  A Solution to the Network Challenges of Data Recovery in Erasure-coded Distributed Storage Systems: A Study on the Facebook Warehouse Cluster , 2013, HotStorage.

[15]  Martin C. Rinard Using early phase termination to eliminate load imbalances at barrier synchronization points , 2007, OOPSLA.

[16]  Dimitrios S. Nikolopoulos,et al.  A programming model and runtime system for significance-aware energy-efficient computing , 2015, PPOPP.

[17]  John Cocke,et al.  A program data flow analysis procedure , 1976, CACM.

[18]  Hairong Kuang,et al.  The Hadoop Distributed File System , 2010, 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST).

[19]  Keshav Pingali,et al.  Proactive Control of Approximate Programs , 2016, ASPLOS.

[20]  Martin C. Rinard,et al.  Verifying quantitative reliability for programs that execute on unreliable hardware , 2013, OOPSLA.

[21]  Van-Anh Truong,et al.  Availability in Globally Distributed Storage Systems , 2010, OSDI.

[22]  Stefan Savage,et al.  Total Recall: System Support for Automated Availability Management , 2004, NSDI.

[23]  Martin C. Rinard,et al.  Approximate computation with outlier detection in Topaz , 2015, OOPSLA.

[24]  Kaushik Roy,et al.  ASLAN: Synthesis of approximate sequential circuits , 2014, 2014 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[25]  Michael I. Jordan,et al.  Statistical debugging: simultaneous identification of multiple bugs , 2006, ICML '06.

[26]  Mona Attariyan,et al.  X-ray: Automating Root-Cause Diagnosis of Performance Anomalies in Production Software , 2012, OSDI.

[27]  Andrew Gordon Wilson,et al.  Fast Kernel Learning for Multidimensional Pattern Extrapolation , 2014, NIPS.

[28]  Sumit Gulwani,et al.  Proving programs robust , 2011, ESEC/FSE '11.

[29]  Luis Ceze,et al.  Architecture support for disciplined approximate programming , 2012, ASPLOS XVII.

[30]  Robbert van Renesse,et al.  Chain Replication for Supporting High Throughput and Availability , 2004, OSDI.

[31]  Henry Hoffmann,et al.  JouleGuard: energy guarantees for approximate applications , 2015, SOSP.

[32]  Thu D. Nguyen,et al.  ApproxHadoop: Bringing Approximations to MapReduce Frameworks , 2015, ASPLOS.

[33]  Gu-Yeon Wei,et al.  HELIX-UP: Relaxing program semantics to unleash parallelization , 2015, 2015 IEEE/ACM International Symposium on Code Generation and Optimization (CGO).

[34]  Bruce M. Maggs,et al.  A Universal Approach to Data Center Network Design , 2014, SPAA.

[35]  Armando Fox,et al.  Fingerprinting the datacenter: automated classification of performance crises , 2010, EuroSys '10.

[36]  S.A. Brandt,et al.  CRUSH: Controlled, Scalable, Decentralized Placement of Replicated Data , 2006, ACM/IEEE SC 2006 Conference (SC'06).

[37]  Saurabh Bagchi,et al.  Automatic Problem Localization via Multi-dimensional Metric Profiling , 2013, 2013 IEEE 32nd International Symposium on Reliable Distributed Systems.

[38]  Martin Schulz,et al.  Scalable temporal order analysis for large scale debugging , 2009, Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis.

[39]  Scott A. Mahlke,et al.  Input responsiveness: using canary inputs to dynamically steer approximation , 2016, PLDI.

[40]  Mario Blaum,et al.  A Tale of Two Erasure Codes in HDFS , 2015, FAST.

[41]  Nathan R. Tallent,et al.  HPCTOOLKIT: tools for performance analysis of optimized parallel programs , 2010, Concurr. Comput. Pract. Exp..

[42]  Dennis Goeckel,et al.  An adaptive Reed-Solomon errors-and-erasures decoder , 2006, FPGA '06.

[43]  Harish Patil,et al.  Pin: building customized program analysis tools with dynamic instrumentation , 2005, PLDI '05.

[44]  Shan Lu,et al.  Understanding and detecting real-world performance bugs , 2012, PLDI.

[45]  John C. S. Lui,et al.  Optimal recovery of single disk failure in RDP code storage systems , 2010, SIGMETRICS '10.

[46]  Kannan Ramchandran,et al.  Having Your Cake and Eating It Too: Jointly Optimal Erasure Codes for I/O, Storage, and Network-bandwidth , 2015, FAST.

[47]  Lakshmi Ganesh,et al.  Lazy Means Smart: Reducing Repair Bandwidth Costs in Erasure-coded Distributed Storage , 2014, SYSTOR 2014.

[48]  Andrew Gordon Wilson,et al.  Gaussian Process Kernels for Pattern Discovery and Extrapolation , 2013, ICML.

[49]  Camil Demetrescu,et al.  Input-Sensitive Profiling , 2014, IEEE Trans. Software Eng..

[50]  Alexandros G. Dimakis,et al.  Network Coding for Distributed Storage Systems , 2007, IEEE INFOCOM 2007 - 26th IEEE International Conference on Computer Communications.

[51]  Rupak Majumdar,et al.  Cause clue clauses: error localization using maximum satisfiability , 2010, PLDI '11.

[52]  Bronis R. de Supinski,et al.  Probabilistic diagnosis of performance faults in large-scale parallel applications , 2012, 2012 21st International Conference on Parallel Architectures and Compilation Techniques (PACT).

[53]  Zeyuan Allen Zhu,et al.  Randomized accuracy-aware program transformations for efficient approximate computations , 2012, POPL '12.

[54]  José Antonio Lozano,et al.  Sensitivity Analysis of k-Fold Cross Validation in Prediction Error Estimation , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[55]  Kai Li,et al.  The PARSEC benchmark suite: Characterization and architectural implications , 2008, 2008 International Conference on Parallel Architectures and Compilation Techniques (PACT).

[56]  Nicholas J. Wright,et al.  Modeling and predicting application performance on parallel computers using HPC challenge benchmarks , 2008, 2008 IEEE International Symposium on Parallel and Distributed Processing.

[57]  Scott A. Mahlke,et al.  Rumba: An online quality management system for approximate computing , 2015, 2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA).

[58]  Martin C. Rinard,et al.  Chisel: reliability- and accuracy-aware optimization of approximate computational kernels , 2014, OOPSLA.

[59]  Ju Wang,et al.  Windows Azure Storage: a highly available cloud storage service with strong consistency , 2011, SOSP.

[60]  Alexandros G. Dimakis,et al.  Rebuilding for array codes in distributed storage systems , 2010, 2010 IEEE Globecom Workshops.

[61]  Matthias Hauswirth,et al.  Algorithmic profiling , 2012, PLDI.

[62]  Yang Tang,et al.  NCCloud: applying network coding for the storage repair in a cloud-of-clouds , 2012, FAST.

[63]  Henry Hoffmann,et al.  Quality of service profiling , 2010, 2010 ACM/IEEE 32nd International Conference on Software Engineering.

[64]  Martin C. Rinard,et al.  Proving acceptability properties of relaxed nondeterministic approximate programs , 2012, PLDI.

[65]  John Kubiatowicz,et al.  Erasure Coding Vs. Replication: A Quantitative Comparison , 2002, IPTPS.

[66]  Saurabh Bagchi,et al.  Partial-parallel-repair (PPR): a distributed technique for repairing erasure coded storage , 2016, EuroSys.

[67]  Martin Schulz,et al.  A Scalable and Distributed Dynamic Formal Verifier for MPI Programs , 2010, 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis.

[68]  William Gropp,et al.  Collective Error Detection for MPI Collective Operations , 2005, PVM/MPI.

[69]  David Abramson,et al.  Assertion Based Parallel Debugging , 2011, 2011 11th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing.

[70]  Sriram Rao,et al.  A The Quantcast File System , 2013, Proc. VLDB Endow..

[71]  Lluis Pamies-Juarez,et al.  CORE: Cross-object redundancy for efficient data repair in storage systems , 2013, 2013 IEEE International Conference on Big Data.

[72]  Qiang Xu,et al.  ApproxEigen: An approximate computing technique for large-scale eigen-decomposition , 2015, 2015 IEEE/ACM International Conference on Computer-Aided Design (ICCAD).

[73]  Å. Björck,et al.  Solution of Vandermonde Systems of Equations , 1970 .

[74]  Martin Schulz,et al.  AutomaDeD: Automata-based debugging for dissimilar parallel tasks , 2010, 2010 IEEE/IFIP International Conference on Dependable Systems & Networks (DSN).

[75]  Saurabh Bagchi,et al.  Phase-aware optimization in approximate computing , 2017, 2017 IEEE/ACM International Symposium on Code Generation and Optimization (CGO).

[76]  Bronis R. de Supinski,et al.  Automatic fault characterization via abnormality-enhanced classification , 2012, IEEE/IFIP International Conference on Dependable Systems and Networks (DSN 2012).

[77]  Dimitris S. Papailiopoulos,et al.  Locally Repairable Codes , 2014, IEEE Trans. Inf. Theory.

[78]  Alan Edelman,et al.  Language and compiler support for auto-tuning variable-accuracy algorithms , 2011, International Symposium on Code Generation and Optimization (CGO 2011).

[79]  Hai Liu,et al.  PErasure: A parallel Cauchy Reed-Solomon coding library for GPUs , 2015, 2015 IEEE International Conference on Communications (ICC).

[80]  Amin Vahdat,et al.  A scalable, commodity data center network architecture , 2008, SIGCOMM '08.

[81]  Carl D. Meyer,et al.  Deeper Inside PageRank , 2004, Internet Math..

[82]  Keshav Pingali,et al.  Tiling Imperfectly-nested Loop Nests , 2000, ACM/IEEE SC 2000 Conference (SC'00).

[83]  Andreas Haeberlen,et al.  Efficient Replica Maintenance for Distributed Storage Systems , 2006, NSDI.

[84]  Dan Grossman,et al.  EnerJ: approximate data types for safe and general low-power computation , 2011, PLDI '11.

[85]  Donald B. Johnson,et al.  Finding All the Elementary Circuits of a Directed Graph , 1975, SIAM J. Comput..

[86]  F. MacWilliams,et al.  The Theory of Error-Correcting Codes , 1977 .

[87]  Corporate The MPI Forum,et al.  MPI: a message passing interface , 1993, Supercomputing '93.

[88]  Peter D. Düben,et al.  On the use of inexact, pruned hardware in atmospheric modelling , 2014, Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences.

[89]  Henry Hoffmann,et al.  Managing performance vs. accuracy trade-offs with loop perforation , 2011, ESEC/FSE '11.

[90]  Stephen McCamant,et al.  The Daikon system for dynamic detection of likely invariants , 2007, Sci. Comput. Program..

[91]  Baochun Li,et al.  Beehive: Erasure Codes for Fixing Multiple Failures in Distributed Storage Systems , 2017, IEEE Transactions on Parallel and Distributed Systems.

[92]  Rajeev Thakur,et al.  Optimization of Collective Communication Operations in MPICH , 2005, Int. J. High Perform. Comput. Appl..

[93]  Alfred V. Aho,et al.  Compilers: Principles, Techniques, and Tools (2nd Edition) , 2006 .

[94]  Jiawei Han,et al.  Modeling Probabilistic Measurement Correlations for Problem Determination in Large-Scale Distributed Systems , 2009, 2009 29th IEEE International Conference on Distributed Computing Systems.

[95]  W. Haque Concurrent Deadlock Detection In Parallel Programs , 2006 .

[96]  Martin C. Rinard Probabilistic accuracy bounds for fault-tolerant computations that discard tasks , 2006, ICS '06.

[97]  James Demmel,et al.  Optimizing matrix multiply using PHiPAC: a portable, high-performance, ANSI C coding methodology , 1997, ICS '97.

[98]  Xiaojin Zhu,et al.  Statistical Debugging Using Latent Topic Models , 2007, ECML.

[99]  Jie Han,et al.  Approximate computing: An emerging paradigm for energy-efficient design , 2013, 2013 18th IEEE European Test Symposium (ETS).

[100]  Barton P. Miller,et al.  Problem Diagnosis in Large-Scale Computing Environments , 2006, ACM/IEEE SC 2006 Conference (SC'06).

[101]  Ben Y. Zhao,et al.  OceanStore: an architecture for global-scale persistent storage , 2000, SIGP.

[102]  Sudheendra Hangal,et al.  Tracking down software bugs using automatic anomaly detection , 2002, ICSE '02.

[103]  Lihao Xu,et al.  Optimizing Cauchy Reed-Solomon Codes for Fault-Tolerant Network Storage Applications , 2006, Fifth IEEE International Symposium on Network Computing and Applications (NCA'06).

[104]  Martin Schulz,et al.  Accurate application progress analysis for large-scale parallel debugging , 2014, PLDI.

[105]  Jacob Nelson,et al.  Approximate storage in solid-state memories , 2013, 2013 46th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[106]  Catherine D. Schuman,et al.  A Performance Evaluation and Examination of Open-Source Erasure Coding Libraries for Storage , 2009, FAST.

[107]  Cheng Huang,et al.  Erasure Coding in Windows Azure Storage , 2012, USENIX Annual Technical Conference.

[108]  Kalyan Veeramachaneni,et al.  Autotuning algorithmic choice for input sensitivity , 2015, PLDI.

[109]  Michael Mitzenmacher,et al.  Detecting Novel Associations in Large Data Sets , 2011, Science.

[110]  Henry Hoffmann,et al.  Dynamic knobs for responsive power-aware computing , 2011, ASPLOS XVI.

[111]  Utpal Banerjee,et al.  Loop Transformations for Restructuring Compilers: The Foundations , 1993, Springer US.

[112]  M. Rogers Analytic Solutions for the Blast-Wave Problem with an Atmosphere of Varying Density. , 1957 .

[113]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[114]  Christopher Stewart,et al.  Exploiting nonstationarity for performance prediction , 2007, EuroSys '07.

[115]  Cory Hill,et al.  f4: Facebook's Warm BLOB Storage System , 2014, OSDI.

[116]  Ben Y. Zhao,et al.  Pond: The OceanStore Prototype , 2003, FAST.

[117]  Cheng Huang,et al.  Rethinking erasure codes for cloud file systems: minimizing I/O for recovery and degraded reads , 2012, FAST.

[118]  Scott A. Mahlke,et al.  SAGE: Self-tuning approximation for graphics engines , 2013, 2013 46th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[119]  Kaushik Roy,et al.  Scalable effort hardware design: Exploiting algorithmic resilience for energy efficiency , 2010, Design Automation Conference.

[120]  Qiang Xu,et al.  ApproxIt: An approximate computing framework for iterative methods , 2014, 2014 51st ACM/EDAC/IEEE Design Automation Conference (DAC).

[121]  Riccardo Poli,et al.  Particle swarm optimization , 1995, Swarm Intelligence.

[122]  Alexander Aiken,et al.  Scalable error detection using boolean satisfiability , 2005, POPL '05.

[123]  Scott A. Mahlke,et al.  Paraprox: pattern-based approximation for data parallel applications , 2014, ASPLOS.

[124]  Dirk Grunwald,et al.  Identifying potential parallelism via loop-centric profiling , 2007, CF '07.

[125]  Trishul M. Chilimbi,et al.  HOLMES: Effective statistical debugging via efficient path profiling , 2009, 2009 IEEE 31st International Conference on Software Engineering.

[126]  Woongki Baek,et al.  Green: a framework for supporting energy-conscious programming using controlled approximation , 2010, PLDI '10.

[127]  Alan Edelman,et al.  PetaBricks: a language and compiler for algorithmic choice , 2009, PLDI '09.

[128]  Martin C. Rinard,et al.  Automatically identifying critical input regions and code in applications , 2010, ISSTA '10.

[129]  H. Michael Ji An optimized processor for fast Reed-Solomon encoding and decoding , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.