Hound: Causal Learning for Datacenter-scale Straggler Diagnosis
暂无分享,去创建一个
[1] J. Lunceford,et al. Stratification and weighting via the propensity score in estimation of causal treatment effects: a comparative study , 2004, Statistics in medicine.
[2] Yoav Freund,et al. A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.
[3] Gregory R. Ganger,et al. Diagnosing Performance Changes by Comparing Request Flows , 2011, NSDI.
[4] Gregory R. Ganger,et al. Ironmodel: robust performance models in the wild , 2008, SIGMETRICS '08.
[5] GhemawatSanjay,et al. The Google file system , 2003 .
[6] Yu Luo,et al. lprof: A Non-intrusive Request Flow Profiler for Distributed Systems , 2014, OSDI.
[7] Harald Steck,et al. Learning the Bayesian Network Structure: Dirichlet Prior versus Data , 2008, UAI 2008.
[8] B. Schweizer,et al. On Nonparametric Measures of Dependence for Random Variables , 1981 .
[9] Srinivasan Seshan,et al. Developing a predictive model of quality of experience for internet video , 2013, SIGCOMM.
[10] Gregory F. Cooper,et al. The Computational Complexity of Probabilistic Inference Using Bayesian Belief Networks , 1990, Artif. Intell..
[11] Leo Breiman,et al. Bagging Predictors , 1996, Machine Learning.
[12] Qi Zhao,et al. Towards automated performance diagnosis in a large IPTV network , 2009, SIGCOMM '09.
[13] Jeffrey S. Chase,et al. Correlating Instrumentation Data to System States: A Building Block for Automated Diagnosis and Control , 2004, OSDI.
[14] Bill Ravens,et al. An Introduction to Copulas , 2000, Technometrics.
[15] David H. Wolpert,et al. No free lunch theorems for optimization , 1997, IEEE Trans. Evol. Comput..
[16] Joseph K. Bradley,et al. Spark SQL: Relational Data Processing in Spark , 2015, SIGMOD Conference.
[17] Magdalena Balazinska,et al. Skew-resistant parallel processing of feature-extracting scientific user-defined functions , 2010, SoCC '10.
[18] Armando Fox,et al. HiLighter: Automatically Building Robust Signatures of Performance Behavior for Small- and Large-Scale Systems , 2008, SysML.
[19] Armando Fox,et al. Capturing, indexing, clustering, and retrieving system history , 2005, SOSP '05.
[20] Joseph L. Hellerstein,et al. Obfuscatory obscanturism: Making workload traces of commercially-sensitive systems safe to release , 2012, 2012 IEEE Network Operations and Management Symposium.
[21] Luiz André Barroso,et al. The tail at scale , 2013, CACM.
[22] Sang Joon Kim,et al. A Mathematical Theory of Communication , 2006 .
[23] Eric A. Brewer,et al. Pinpoint: problem determination in large, dynamic Internet services , 2002, Proceedings International Conference on Dependable Systems and Networks.
[24] Stefan Szeider,et al. Algorithms and Complexity Results for Exact Bayesian Structure Learning , 2010, UAI.
[25] Armando Fox,et al. Ensembles of models for automated diagnosis of system performance problems , 2005, 2005 International Conference on Dependable Systems and Networks (DSN'05).
[26] Scott Shenker,et al. Making Sense of Performance in Data Analytics Frameworks , 2015, NSDI.
[27] Sanjay Ghemawat,et al. MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.
[28] Magdalena Balazinska,et al. SkewTune: mitigating skew in mapreduce applications , 2012, SIGMOD Conference.
[29] Randy H. Katz,et al. Improving MapReduce Performance in Heterogeneous Environments , 2008, OSDI.
[30] Damaris Zurell,et al. Collinearity: a review of methods to deal with it and a simulation study evaluating their performance , 2013 .
[31] Lingjia Tang,et al. Treadmill: Attributing the Source of Tail Latency through Precise Load Testing and Statistical Inference , 2016, ISCA.
[32] Elias Bareinboim,et al. Controlling Selection Bias in Causal Inference , 2011, AISTATS.
[33] Armando Fox,et al. Fingerprinting the datacenter: automated classification of performance crises , 2010, EuroSys '10.
[34] Randy H. Katz,et al. Wrangler: Predictable and Faster Jobs using Fewer Resources , 2014, SoCC.
[35] Raghunath Othayoth Nambiar,et al. The making of TPC-DS , 2006, VLDB.
[36] H. Zou,et al. Regularization and variable selection via the elastic net , 2005 .
[37] Randy H. Katz,et al. Multi-Task Learning for Straggler Avoiding Predictive Job Scheduling , 2016, J. Mach. Learn. Res..
[38] Donald Beaver,et al. Dapper, a Large-Scale Distributed Systems Tracing Infrastructure , 2010 .
[39] Adam Wierman,et al. Hopper: Decentralized Speculation-aware Cluster Scheduling at Scale , 2015, SIGCOMM.
[40] Yuqing Zhu,et al. BigDataBench: A big data benchmark suite from internet services , 2014, 2014 IEEE 20th International Symposium on High Performance Computer Architecture (HPCA).
[41] R. Tibshirani. Regression Shrinkage and Selection via the Lasso , 1996 .
[42] Lance M. Berc,et al. Continuous profiling: where have all the cycles gone? , 1997, ACM Trans. Comput. Syst..
[43] Rodrigo Fonseca,et al. Pivot tracing , 2018, USENIX ATC.
[44] Barnabás Póczos,et al. Copula-based Kernel Dependency Measures , 2012, ICML.
[45] Randy H. Katz,et al. X-Trace: A Pervasive Network Tracing Framework , 2007, NSDI.
[46] Albert G. Greenberg,et al. Reining in the Outliers in Map-Reduce Clusters using Mantri , 2010, OSDI.
[47] Michael I. Jordan,et al. Detecting large-scale system problems by mining console logs , 2009, SOSP '09.
[48] Thomas F. Wenisch,et al. The Mystery Machine: End-to-end Performance Analysis of Large-scale Internet Services , 2014, OSDI.
[49] Scott Shenker,et al. Usenix Association 10th Usenix Symposium on Networked Systems Design and Implementation (nsdi '13) 185 Effective Straggler Mitigation: Attack of the Clones , 2022 .
[50] Mirco Nanni,et al. Speeding-Up Hierarchical Agglomerative Clustering in Presence of Expensive Metrics , 2005, PAKDD.
[51] Salvatore J. Stolfo,et al. Experiments on multistrategy learning by meta-learning , 1993, CIKM '93.
[52] Jesús Muñoz,et al. Comparison of statistical methods commonly used in predictive modelling , 2004 .
[53] D. Rubin,et al. The central role of the propensity score in observational studies for causal effects , 1983 .
[54] David M. Blei,et al. Probabilistic topic models , 2012, Commun. ACM.
[55] Xiao Zhang,et al. CPI2: CPU performance isolation for shared compute clusters , 2013, EuroSys '13.
[56] AmmarMostafa,et al. Answering what-if deployment and configuration questions with wise , 2008 .
[57] Peter Nobel,et al. Practical performance models for complex, popular applications , 2010, SIGMETRICS '10.
[58] Eshcar Hillel,et al. Predicting Execution Bottlenecks in Map-Reduce Clusters , 2012, HotCloud.
[59] Praveen K. Kopalle,et al. The impact of collinearity on regression analysis: the asymmetric effect of negative and positive correlations , 2002 .
[60] Reza Modarres,et al. Measures of Dependence , 2011, International Encyclopedia of Statistical Science.
[61] Richard Mortier,et al. Using Magpie for Request Extraction and Workload Modelling , 2004, OSDI.
[62] Jennifer Neville,et al. Structured Comparative Analysis of Systems Logs to Diagnose Performance Problems , 2012, NSDI.
[63] Harry Zhang,et al. A Fast Decision Tree Learning Algorithm , 2006, AAAI.
[64] Gerard de Haan,et al. Comparison of machine learning techniques for target detection , 2012, Artificial Intelligence Review.
[65] Seunghak Lee,et al. Exploiting Bounded Staleness to Speed Up Big Data Analytics , 2014, USENIX Annual Technical Conference.
[66] D. Pregibon. Resistant fits for some commonly used logistic models with medical application. , 1982, Biometrics.
[67] Bernhard Schölkopf,et al. The Randomized Dependence Coefficient , 2013, NIPS.
[68] Jialin Li,et al. Tales of the Tail: Hardware, OS, and Application-level Sources of Tail Latency , 2014, SoCC.
[69] Suzana de Siqueira Santos,et al. A comparative study of statistical methods used to identify dependencies between gene expression signals , 2014, Briefings Bioinform..
[70] Judea Pearl,et al. Graphical Condition for Identification in recursive SEM , 2006, UAI.
[71] Francis R. Bach,et al. Online Learning for Latent Dirichlet Allocation , 2010, NIPS.
[72] Gang Ren,et al. Google-Wide Profiling: A Continuous Profiling Infrastructure for Data Centers , 2010, IEEE Micro.
[73] Susan L. Graham,et al. Gprof: A call graph execution profiler , 1982, SIGPLAN '82.
[74] Sheng Ma,et al. Adaptive diagnosis in distributed systems , 2005, IEEE Transactions on Neural Networks.
[75] F. Wilcoxon. Individual Comparisons by Ranking Methods , 1945 .