Localizing Faults in Cloud Systems
暂无分享,去创建一个
Leonardo Mariani | Mauro Pezzè | Oliviero Riganelli | Cristina Monni | Rui Xin | M. Pezzè | O. Riganelli | L. Mariani | C. Monni | Rui Xin
[1] Xiaohui Gu,et al. Ieee Transactions on Parallel and Distributed Systems (tpds) Perfcompass: Online Performance Anomaly Fault Localization and Inference in Infrastructure-as-a-service Clouds , 2022 .
[2] Rajeev Gandhi,et al. Draco: Statistical diagnosis of chronic problems in large distributed systems , 2012, IEEE/IFIP International Conference on Dependable Systems and Networks (DSN 2012).
[3] Xin Chen,et al. Failure Analysis of Jobs in Compute Clouds: A Google Cluster Case Study , 2014, 2014 IEEE 25th International Symposium on Software Reliability Engineering.
[4] C. Granger. Investigating causal relations by econometric models and cross-spectral methods , 1969 .
[5] Alex C. Snoeren,et al. Passive Realtime Datacenter Fault Detection and Localization , 2017, NSDI.
[6] J. Friedman,et al. THE NON-BACKTRACKING SPECTRUM OF THE UNIVERSAL COVER OF A GRAPH , 2007, 0712.0192.
[7] Keith McCloghrie,et al. Introduction to Community-based SNMPv2 , 1996, RFC.
[8] Liming Zhu,et al. POD-Diagnosis: Error Diagnosis of Sporadic Operations on Cloud Applications , 2014, 2014 44th Annual IEEE/IFIP International Conference on Dependable Systems and Networks.
[9] Rajeev Gandhi,et al. Ganesha: blackBox diagnosis of MapReduce systems , 2010, PERV.
[10] Amy Nicole Langville,et al. A Survey of Eigenvector Methods for Web Information Retrieval , 2005, SIAM Rev..
[11] Chita R. Das,et al. CloudPD: Problem determination and diagnosis in shared dynamic clouds , 2013, 2013 43rd Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN).
[12] Nick Feamster,et al. Characterizing correlated latency anomalies in broadband access networks , 2013, SIGCOMM.
[13] Ricardo Bianchini,et al. DeepDive: Transparently Identifying and Managing Performance Interference in Virtualized Environments , 2013, USENIX Annual Technical Conference.
[14] George Varghese,et al. Gestalt: Fast, Unified Fault Localization for Networked Systems , 2014, USENIX Annual Technical Conference.
[15] Huaimin Wang,et al. Toward Fine-Grained, Unsupervised, Scalable Performance Diagnosis for Production Cloud Computing Systems , 2013, IEEE Transactions on Parallel and Distributed Systems.
[16] Ricardo Vilalta,et al. Predicting rare events in temporal domains , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..
[17] Cemal Yilmaz,et al. Seer: A Lightweight Online Failure Prediction Approach , 2017, 2017 IEEE 41st Annual Computer Software and Applications Conference (COMPSAC).
[18] Leonardo Mariani,et al. Dynamic Analysis for Diagnosing Integration Faults , 2011, IEEE Transactions on Software Engineering.
[19] Ziming Zhang,et al. Ensemble of Bayesian Predictors for Autonomic Failure Management in Cloud Computing , 2011, 2011 Proceedings of 20th International Conference on Computer Communications and Networks (ICCCN).
[20] Sonia Fahmy,et al. NFV-VITAL: A framework for characterizing the performance of virtual network functions , 2015, 2015 IEEE Conference on Network Function Virtualization and Software Defined Network (NFV-SDN).
[21] Malgorzata Steinder,et al. A survey of fault localization techniques in computer networks , 2004, Sci. Comput. Program..
[22] Xiao Zhang,et al. Localization and centrality in networks , 2014, Physical review. E, Statistical, nonlinear, and soft matter physics.
[23] Priya Narasimhan,et al. Tiresias: Black-Box Failure Prediction in Distributed Systems , 2007, 2007 IEEE International Parallel and Distributed Processing Symposium.
[24] Yan Liu,et al. Temporal causal modeling with graphical granger methods , 2007, KDD '07.
[25] Hua Chen,et al. Pingmesh: A Large-Scale System for Data Center Network Latency Measurement and Analysis , 2015, SIGCOMM.
[26] Xiaohui Gu,et al. UBL: unsupervised behavior learning for predicting performance anomalies in virtualized cloud systems , 2012, ICAC '12.
[27] Herodotos Herodotou,et al. Scalable near real-time failure localization of data center networks , 2014, KDD.
[28] Xiaohui Gu,et al. PerfScope: Practical Online Server Performance Bug Inference in Production Cloud Computing Infrastructures , 2014, SoCC.
[29] Douglas C. Schmidt,et al. Ultra-Large-Scale Systems: The Software Challenge of the Future , 2006 .
[30] Wei-Ying Ma,et al. Automated known problem diagnosis with event traces , 2006, EuroSys.
[31] C. R. Ramakrishnan,et al. Power Optimization in Fault-Tolerant Mobile Ad Hoc Networks , 2008, 2008 11th IEEE High Assurance Systems Engineering Symposium.
[32] Christoph Reich,et al. Key Performance Indicators for Cloud Computing SLAs , 2013 .
[33] Andreas Johnsson,et al. Online network performance degradation localization using probabilistic inference and change detection , 2014, 2014 IEEE Network Operations and Management Symposium (NOMS).
[34] Kahina Lazri,et al. Anomaly Detection and Root Cause Localization in Virtual Network Functions , 2016, 2016 IEEE 27th International Symposium on Software Reliability Engineering (ISSRE).
[35] Haifeng Chen,et al. PeerWatch: a fault detection and diagnosis tool for virtualized consolidation systems , 2010, ICAC '10.
[36] R. Johnston,et al. The SAGE Handbook of Social Network Analysis , 2011 .
[37] Ananthram Swami,et al. Adaptive algorithms for diagnosing large-scale failures in computer networks , 2012, IEEE/IFIP International Conference on Dependable Systems and Networks (DSN 2012).
[38] Xiaohui Gu,et al. PREPARE: Predictive Performance Anomaly Prevention for Virtualized Cloud Systems , 2012, 2012 IEEE 32nd International Conference on Distributed Computing Systems.
[39] Haifeng Chen,et al. Fault detection and localization in distributed systems using invariant relationships , 2013, 2013 43rd Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN).
[40] Rajeev Gandhi,et al. Kahuna: Problem diagnosis for Mapreduce-based cloud computing environments , 2010, 2010 IEEE Network Operations and Management Symposium - NOMS 2010.
[41] Glenn A. Fink,et al. Predicting Computer System Failures Using Support Vector Machines , 2008, WASL.
[42] David Hutchison,et al. Resilience and survivability in communication networks: Strategies, principles, and survey of disciplines , 2010, Comput. Networks.
[43] Erez Zadok,et al. DARC: dynamic analysis of root causes of latency distributions , 2008, SIGMETRICS '08.
[44] Eric Bauer,et al. Reliability and Availability of Cloud Computing , 2012 .
[45] Werner Vogels,et al. Dynamo: amazon's highly available key-value store , 2007, SOSP.