What broke where for distributed and parallel applications---a whodunit story
暂无分享,去创建一个
[1] J. R. Quinlan. Learning With Continuous Classes , 1992 .
[2] David M. Brooks,et al. Accurate and efficient regression modeling for microarchitectural performance and power prediction , 2006, ASPLOS XII.
[3] Nihar B. Shah,et al. Optimal Exact-Regenerating Codes for Distributed Storage at the MSR and MBR Points via a Product-Matrix Construction , 2010, IEEE Transactions on Information Theory.
[4] Feng Mao,et al. Exploiting statistical correlations for proactive prediction of program behaviors , 2010, CGO '10.
[5] Rodrigo Rodrigues,et al. High Availability in DHTs: Erasure Coding vs. Replication , 2005, IPTPS.
[6] Kannan Ramchandran,et al. A "hitchhiker's" guide to fast and efficient data reconstruction in erasure-coded data centers , 2015, SIGCOMM 2015.
[7] Saurabh Bagchi,et al. Dealing with the Unknown: Resilience to Prediction Errors , 2015, 2015 International Conference on Parallel Architecture and Compilation (PACT).
[8] Dan Grossman,et al. Monitoring and Debugging the Quality of Results in Approximate Programs , 2015, ASPLOS.
[9] Dimitris S. Papailiopoulos,et al. XORing Elephants: Novel Erasure Codes for Big Data , 2013, Proc. VLDB Endow..
[10] Martin Schulz,et al. Large scale debugging of parallel tasks with AutomaDeD , 2011, 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC).
[11] Dhabaleswar K. Panda,et al. DMTracker: finding bugs in large-scale parallel programs by detecting anomaly in data movements , 2007, Proceedings of the 2007 ACM/IEEE Conference on Supercomputing (SC '07).
[12] Christof Fetzer,et al. IncApprox: A Data Analytics System for Incremental Approximate Computing , 2016, WWW.
[13] David F. Bacon,et al. Compiler transformations for high-performance computing , 1994, CSUR.
[14] Kannan Ramchandran,et al. A Solution to the Network Challenges of Data Recovery in Erasure-coded Distributed Storage Systems: A Study on the Facebook Warehouse Cluster , 2013, HotStorage.
[15] Martin C. Rinard. Using early phase termination to eliminate load imbalances at barrier synchronization points , 2007, OOPSLA.
[16] Dimitrios S. Nikolopoulos,et al. A programming model and runtime system for significance-aware energy-efficient computing , 2015, PPOPP.
[17] John Cocke,et al. A program data flow analysis procedure , 1976, CACM.
[18] Hairong Kuang,et al. The Hadoop Distributed File System , 2010, 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST).
[19] Keshav Pingali,et al. Proactive Control of Approximate Programs , 2016, ASPLOS.
[20] Martin C. Rinard,et al. Verifying quantitative reliability for programs that execute on unreliable hardware , 2013, OOPSLA.
[21] Van-Anh Truong,et al. Availability in Globally Distributed Storage Systems , 2010, OSDI.
[22] Stefan Savage,et al. Total Recall: System Support for Automated Availability Management , 2004, NSDI.
[23] Martin C. Rinard,et al. Approximate computation with outlier detection in Topaz , 2015, OOPSLA.
[24] Kaushik Roy,et al. ASLAN: Synthesis of approximate sequential circuits , 2014, 2014 Design, Automation & Test in Europe Conference & Exhibition (DATE).
[25] Michael I. Jordan,et al. Statistical debugging: simultaneous identification of multiple bugs , 2006, ICML '06.
[26] Mona Attariyan,et al. X-ray: Automating Root-Cause Diagnosis of Performance Anomalies in Production Software , 2012, OSDI.
[27] Andrew Gordon Wilson,et al. Fast Kernel Learning for Multidimensional Pattern Extrapolation , 2014, NIPS.
[28] Sumit Gulwani,et al. Proving programs robust , 2011, ESEC/FSE '11.
[29] Luis Ceze,et al. Architecture support for disciplined approximate programming , 2012, ASPLOS XVII.
[30] Robbert van Renesse,et al. Chain Replication for Supporting High Throughput and Availability , 2004, OSDI.
[31] Henry Hoffmann,et al. JouleGuard: energy guarantees for approximate applications , 2015, SOSP.
[32] Thu D. Nguyen,et al. ApproxHadoop: Bringing Approximations to MapReduce Frameworks , 2015, ASPLOS.
[33] Gu-Yeon Wei,et al. HELIX-UP: Relaxing program semantics to unleash parallelization , 2015, 2015 IEEE/ACM International Symposium on Code Generation and Optimization (CGO).
[34] Bruce M. Maggs,et al. A Universal Approach to Data Center Network Design , 2014, SPAA.
[35] Armando Fox,et al. Fingerprinting the datacenter: automated classification of performance crises , 2010, EuroSys '10.
[36] S.A. Brandt,et al. CRUSH: Controlled, Scalable, Decentralized Placement of Replicated Data , 2006, ACM/IEEE SC 2006 Conference (SC'06).
[37] Saurabh Bagchi,et al. Automatic Problem Localization via Multi-dimensional Metric Profiling , 2013, 2013 IEEE 32nd International Symposium on Reliable Distributed Systems.
[38] Martin Schulz,et al. Scalable temporal order analysis for large scale debugging , 2009, Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis.
[39] Scott A. Mahlke,et al. Input responsiveness: using canary inputs to dynamically steer approximation , 2016, PLDI.
[40] Mario Blaum,et al. A Tale of Two Erasure Codes in HDFS , 2015, FAST.
[41] Nathan R. Tallent,et al. HPCTOOLKIT: tools for performance analysis of optimized parallel programs , 2010, Concurr. Comput. Pract. Exp..
[42] Dennis Goeckel,et al. An adaptive Reed-Solomon errors-and-erasures decoder , 2006, FPGA '06.
[43] Harish Patil,et al. Pin: building customized program analysis tools with dynamic instrumentation , 2005, PLDI '05.
[44] Shan Lu,et al. Understanding and detecting real-world performance bugs , 2012, PLDI.
[45] John C. S. Lui,et al. Optimal recovery of single disk failure in RDP code storage systems , 2010, SIGMETRICS '10.
[46] Kannan Ramchandran,et al. Having Your Cake and Eating It Too: Jointly Optimal Erasure Codes for I/O, Storage, and Network-bandwidth , 2015, FAST.
[47] Lakshmi Ganesh,et al. Lazy Means Smart: Reducing Repair Bandwidth Costs in Erasure-coded Distributed Storage , 2014, SYSTOR 2014.
[48] Andrew Gordon Wilson,et al. Gaussian Process Kernels for Pattern Discovery and Extrapolation , 2013, ICML.
[49] Camil Demetrescu,et al. Input-Sensitive Profiling , 2014, IEEE Trans. Software Eng..
[50] Alexandros G. Dimakis,et al. Network Coding for Distributed Storage Systems , 2007, IEEE INFOCOM 2007 - 26th IEEE International Conference on Computer Communications.
[51] Rupak Majumdar,et al. Cause clue clauses: error localization using maximum satisfiability , 2010, PLDI '11.
[52] Bronis R. de Supinski,et al. Probabilistic diagnosis of performance faults in large-scale parallel applications , 2012, 2012 21st International Conference on Parallel Architectures and Compilation Techniques (PACT).
[53] Zeyuan Allen Zhu,et al. Randomized accuracy-aware program transformations for efficient approximate computations , 2012, POPL '12.
[54] José Antonio Lozano,et al. Sensitivity Analysis of k-Fold Cross Validation in Prediction Error Estimation , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[55] Kai Li,et al. The PARSEC benchmark suite: Characterization and architectural implications , 2008, 2008 International Conference on Parallel Architectures and Compilation Techniques (PACT).
[56] Nicholas J. Wright,et al. Modeling and predicting application performance on parallel computers using HPC challenge benchmarks , 2008, 2008 IEEE International Symposium on Parallel and Distributed Processing.
[57] Scott A. Mahlke,et al. Rumba: An online quality management system for approximate computing , 2015, 2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA).
[58] Martin C. Rinard,et al. Chisel: reliability- and accuracy-aware optimization of approximate computational kernels , 2014, OOPSLA.
[59] Ju Wang,et al. Windows Azure Storage: a highly available cloud storage service with strong consistency , 2011, SOSP.
[60] Alexandros G. Dimakis,et al. Rebuilding for array codes in distributed storage systems , 2010, 2010 IEEE Globecom Workshops.
[61] Matthias Hauswirth,et al. Algorithmic profiling , 2012, PLDI.
[62] Yang Tang,et al. NCCloud: applying network coding for the storage repair in a cloud-of-clouds , 2012, FAST.
[63] Henry Hoffmann,et al. Quality of service profiling , 2010, 2010 ACM/IEEE 32nd International Conference on Software Engineering.
[64] Martin C. Rinard,et al. Proving acceptability properties of relaxed nondeterministic approximate programs , 2012, PLDI.
[65] John Kubiatowicz,et al. Erasure Coding Vs. Replication: A Quantitative Comparison , 2002, IPTPS.
[66] Saurabh Bagchi,et al. Partial-parallel-repair (PPR): a distributed technique for repairing erasure coded storage , 2016, EuroSys.
[67] Martin Schulz,et al. A Scalable and Distributed Dynamic Formal Verifier for MPI Programs , 2010, 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis.
[68] William Gropp,et al. Collective Error Detection for MPI Collective Operations , 2005, PVM/MPI.
[69] David Abramson,et al. Assertion Based Parallel Debugging , 2011, 2011 11th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing.
[70] Sriram Rao,et al. A The Quantcast File System , 2013, Proc. VLDB Endow..
[71] Lluis Pamies-Juarez,et al. CORE: Cross-object redundancy for efficient data repair in storage systems , 2013, 2013 IEEE International Conference on Big Data.
[72] Qiang Xu,et al. ApproxEigen: An approximate computing technique for large-scale eigen-decomposition , 2015, 2015 IEEE/ACM International Conference on Computer-Aided Design (ICCAD).
[73] Å. Björck,et al. Solution of Vandermonde Systems of Equations , 1970 .
[74] Martin Schulz,et al. AutomaDeD: Automata-based debugging for dissimilar parallel tasks , 2010, 2010 IEEE/IFIP International Conference on Dependable Systems & Networks (DSN).
[75] Saurabh Bagchi,et al. Phase-aware optimization in approximate computing , 2017, 2017 IEEE/ACM International Symposium on Code Generation and Optimization (CGO).
[76] Bronis R. de Supinski,et al. Automatic fault characterization via abnormality-enhanced classification , 2012, IEEE/IFIP International Conference on Dependable Systems and Networks (DSN 2012).
[77] Dimitris S. Papailiopoulos,et al. Locally Repairable Codes , 2014, IEEE Trans. Inf. Theory.
[78] Alan Edelman,et al. Language and compiler support for auto-tuning variable-accuracy algorithms , 2011, International Symposium on Code Generation and Optimization (CGO 2011).
[79] Hai Liu,et al. PErasure: A parallel Cauchy Reed-Solomon coding library for GPUs , 2015, 2015 IEEE International Conference on Communications (ICC).
[80] Amin Vahdat,et al. A scalable, commodity data center network architecture , 2008, SIGCOMM '08.
[81] Carl D. Meyer,et al. Deeper Inside PageRank , 2004, Internet Math..
[82] Keshav Pingali,et al. Tiling Imperfectly-nested Loop Nests , 2000, ACM/IEEE SC 2000 Conference (SC'00).
[83] Andreas Haeberlen,et al. Efficient Replica Maintenance for Distributed Storage Systems , 2006, NSDI.
[84] Dan Grossman,et al. EnerJ: approximate data types for safe and general low-power computation , 2011, PLDI '11.
[85] Donald B. Johnson,et al. Finding All the Elementary Circuits of a Directed Graph , 1975, SIAM J. Comput..
[86] F. MacWilliams,et al. The Theory of Error-Correcting Codes , 1977 .
[87] Corporate The MPI Forum,et al. MPI: a message passing interface , 1993, Supercomputing '93.
[88] Peter D. Düben,et al. On the use of inexact, pruned hardware in atmospheric modelling , 2014, Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences.
[89] Henry Hoffmann,et al. Managing performance vs. accuracy trade-offs with loop perforation , 2011, ESEC/FSE '11.
[90] Stephen McCamant,et al. The Daikon system for dynamic detection of likely invariants , 2007, Sci. Comput. Program..
[91] Baochun Li,et al. Beehive: Erasure Codes for Fixing Multiple Failures in Distributed Storage Systems , 2017, IEEE Transactions on Parallel and Distributed Systems.
[92] Rajeev Thakur,et al. Optimization of Collective Communication Operations in MPICH , 2005, Int. J. High Perform. Comput. Appl..
[93] Alfred V. Aho,et al. Compilers: Principles, Techniques, and Tools (2nd Edition) , 2006 .
[94] Jiawei Han,et al. Modeling Probabilistic Measurement Correlations for Problem Determination in Large-Scale Distributed Systems , 2009, 2009 29th IEEE International Conference on Distributed Computing Systems.
[95] W. Haque. Concurrent Deadlock Detection In Parallel Programs , 2006 .
[96] Martin C. Rinard. Probabilistic accuracy bounds for fault-tolerant computations that discard tasks , 2006, ICS '06.
[97] James Demmel,et al. Optimizing matrix multiply using PHiPAC: a portable, high-performance, ANSI C coding methodology , 1997, ICS '97.
[98] Xiaojin Zhu,et al. Statistical Debugging Using Latent Topic Models , 2007, ECML.
[99] Jie Han,et al. Approximate computing: An emerging paradigm for energy-efficient design , 2013, 2013 18th IEEE European Test Symposium (ETS).
[100] Barton P. Miller,et al. Problem Diagnosis in Large-Scale Computing Environments , 2006, ACM/IEEE SC 2006 Conference (SC'06).
[101] Ben Y. Zhao,et al. OceanStore: an architecture for global-scale persistent storage , 2000, SIGP.
[102] Sudheendra Hangal,et al. Tracking down software bugs using automatic anomaly detection , 2002, ICSE '02.
[103] Lihao Xu,et al. Optimizing Cauchy Reed-Solomon Codes for Fault-Tolerant Network Storage Applications , 2006, Fifth IEEE International Symposium on Network Computing and Applications (NCA'06).
[104] Martin Schulz,et al. Accurate application progress analysis for large-scale parallel debugging , 2014, PLDI.
[105] Jacob Nelson,et al. Approximate storage in solid-state memories , 2013, 2013 46th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[106] Catherine D. Schuman,et al. A Performance Evaluation and Examination of Open-Source Erasure Coding Libraries for Storage , 2009, FAST.
[107] Cheng Huang,et al. Erasure Coding in Windows Azure Storage , 2012, USENIX Annual Technical Conference.
[108] Kalyan Veeramachaneni,et al. Autotuning algorithmic choice for input sensitivity , 2015, PLDI.
[109] Michael Mitzenmacher,et al. Detecting Novel Associations in Large Data Sets , 2011, Science.
[110] Henry Hoffmann,et al. Dynamic knobs for responsive power-aware computing , 2011, ASPLOS XVI.
[111] Utpal Banerjee,et al. Loop Transformations for Restructuring Compilers: The Foundations , 1993, Springer US.
[112] M. Rogers. Analytic Solutions for the Blast-Wave Problem with an Atmosphere of Varying Density. , 1957 .
[113] R. Tibshirani. Regression Shrinkage and Selection via the Lasso , 1996 .
[114] Christopher Stewart,et al. Exploiting nonstationarity for performance prediction , 2007, EuroSys '07.
[115] Cory Hill,et al. f4: Facebook's Warm BLOB Storage System , 2014, OSDI.
[116] Ben Y. Zhao,et al. Pond: The OceanStore Prototype , 2003, FAST.
[117] Cheng Huang,et al. Rethinking erasure codes for cloud file systems: minimizing I/O for recovery and degraded reads , 2012, FAST.
[118] Scott A. Mahlke,et al. SAGE: Self-tuning approximation for graphics engines , 2013, 2013 46th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[119] Kaushik Roy,et al. Scalable effort hardware design: Exploiting algorithmic resilience for energy efficiency , 2010, Design Automation Conference.
[120] Qiang Xu,et al. ApproxIt: An approximate computing framework for iterative methods , 2014, 2014 51st ACM/EDAC/IEEE Design Automation Conference (DAC).
[121] Riccardo Poli,et al. Particle swarm optimization , 1995, Swarm Intelligence.
[122] Alexander Aiken,et al. Scalable error detection using boolean satisfiability , 2005, POPL '05.
[123] Scott A. Mahlke,et al. Paraprox: pattern-based approximation for data parallel applications , 2014, ASPLOS.
[124] Dirk Grunwald,et al. Identifying potential parallelism via loop-centric profiling , 2007, CF '07.
[125] Trishul M. Chilimbi,et al. HOLMES: Effective statistical debugging via efficient path profiling , 2009, 2009 IEEE 31st International Conference on Software Engineering.
[126] Woongki Baek,et al. Green: a framework for supporting energy-conscious programming using controlled approximation , 2010, PLDI '10.
[127] Alan Edelman,et al. PetaBricks: a language and compiler for algorithmic choice , 2009, PLDI '09.
[128] Martin C. Rinard,et al. Automatically identifying critical input regions and code in applications , 2010, ISSTA '10.
[129] H. Michael Ji. An optimized processor for fast Reed-Solomon encoding and decoding , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.