Graph-based Incident Aggregation for Large-Scale Online Service Systems
暂无分享,去创建一个
Michael R. Lyu | Hongyu Zhang | Zhuangbin Chen | Jinyang Liu | Yuxin Su | Xiao Ling | Yongqiang Yang | Xuemin Wen
[1] Steven Skiena,et al. DeepWalk: online learning of social representations , 2014, KDD.
[2] Hang Dong,et al. Identifying linked incidents in large-scale online service systems , 2020, ESEC/SIGSOFT FSE.
[3] Leon Moonen,et al. Improving problem identification via automated log clustering using dimensionality reduction , 2018, ESEM.
[4] Ping Wang,et al. Lightweight and Adaptive Service API Performance Monitoring in Highly Dynamic Cloud Environment , 2017, 2017 IEEE International Conference on Services Computing (SCC).
[5] Zhao Yang,et al. A Comparative Analysis of Community Detection Algorithms on Artificial Networks , 2016, Scientific Reports.
[6] Dongmei Zhang,et al. An Empirical Investigation of Incident Triage for Online Service Systems , 2019, 2019 IEEE/ACM 41st International Conference on Software Engineering: Software Engineering in Practice (ICSE-SEIP).
[7] Xiaohui Nie,et al. Understanding and Handling Alert Storm for Online Service Systems , 2020, 2020 IEEE/ACM 42nd International Conference on Software Engineering: Software Engineering in Practice (ICSE-SEIP).
[8] Eamonn Keogh. Exact Indexing of Dynamic Time Warping , 2002, VLDB.
[9] Zhuangbin Chen,et al. AIOps Innovations in Incident Management for Cloud Services , 2020 .
[10] Regunathan Radhakrishnan,et al. Unveiling clusters of events for alert and incident management in large-scale enterprise it , 2014, KDD.
[11] Zhiyuan Liu,et al. Graph Neural Networks: A Review of Methods and Applications , 2018, AI Open.
[12] Haifeng Chen,et al. Ranking the importance of alerts for problem determination in large computer systems , 2009, ICAC '09.
[13] Sushil Jajodia,et al. NSDMiner: Automated discovery of Network Service Dependencies , 2012, 2012 Proceedings IEEE INFOCOM.
[14] Jure Leskovec,et al. node2vec: Scalable Feature Learning for Networks , 2016, KDD.
[15] Jian Pei,et al. Mining frequent patterns without candidate generation , 2000, SIGMOD '00.
[16] Zhou Wang,et al. Real-time incident prediction for online service systems , 2020, ESEC/SIGSOFT FSE.
[17] Shenglin Zhang,et al. FluxRank: A Widely-Deployable Framework to Automatically Localizing Root Cause Machines for Software Service Failure Mitigation , 2019, 2019 IEEE 30th International Symposium on Software Reliability Engineering (ISSRE).
[18] Shang-Pin Ma,et al. Using Service Dependency Graph to Analyze and Test Microservices , 2018, 2018 IEEE 42nd Annual Computer Software and Applications Conference (COMPSAC).
[19] Jure Leskovec,et al. Representation Learning on Graphs: Methods and Applications , 2017, IEEE Data Eng. Bull..
[20] Dan Pei,et al. Automatically and Adaptively Identifying Severe Alerts for Online Service Systems , 2020, IEEE INFOCOM 2020 - IEEE Conference on Computer Communications.
[21] Zibin Zheng,et al. Logzip: Extracting Hidden Structures via Iterative Clustering for Log Compression , 2019, 2019 34th IEEE/ACM International Conference on Automated Software Engineering (ASE).
[22] Junjie Chen,et al. Continuous Incident Triage for Large-Scale Online Service Systems , 2019, 2019 34th IEEE/ACM International Conference on Automated Software Engineering (ASE).
[23] L. Haan,et al. Extreme value theory : an introduction , 2006 .
[24] Qingwei Lin,et al. Efficient incident identification from multi-dimensional issue reports via meta-heuristic search , 2020, ESEC/SIGSOFT FSE.
[25] Jeffrey Dean,et al. Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.
[26] Yu Kang,et al. Towards intelligent incident management: why we need it and how we make it , 2020, ESEC/SIGSOFT FSE.
[27] Peng Huang,et al. Gray Failure: The Achilles' Heel of Cloud-Scale Systems , 2017, HotOS.
[28] Feifei Li,et al. Adaptive log compression for massive log data , 2013, SIGMOD '13.
[29] Tomas Mikolov,et al. Advances in Pre-Training Distributed Word Representations , 2017, LREC.
[30] Behnaz Arzani,et al. Scouts: Improving the Diagnosis Process Through Domain-customized Incident Routing , 2020, SIGCOMM.
[31] Qiang Fu,et al. Identifying Recurrent and Unknown Performance Issues , 2014, 2014 IEEE International Conference on Data Mining.
[32] Yu Zhang,et al. Log Clustering Based Problem Identification for Online Service Systems , 2016, 2016 IEEE/ACM 38th International Conference on Software Engineering Companion (ICSE-C).
[33] Dongmei Zhang,et al. Identifying impactful service system problems via log analysis , 2018, ESEC/SIGSOFT FSE.
[34] Jean-Loup Guillaume,et al. Fast unfolding of communities in large networks , 2008, 0803.0476.
[35] Michael J. Kavis,et al. Architecting the Cloud: Design Decisions for Cloud Computing Service Models (Saas, Paas, and Iaas) , 2014 .
[36] Valentino Constantinou,et al. Detecting Spacecraft Anomalies Using LSTMs and Nonparametric Dynamic Thresholding , 2018, KDD.
[37] Hang Dong,et al. Outage Prediction and Diagnosis for Cloud Service Systems , 2019, WWW.
[38] Hans-Peter Kriegel,et al. A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise , 1996, KDD.
[39] Shilin He,et al. Characterizing the Natural Language Descriptions in Software Logging Statements , 2018, 2018 33rd IEEE/ACM International Conference on Automated Software Engineering (ASE).
[40] Matthijs Douze,et al. FastText.zip: Compressing text classification models , 2016, ArXiv.
[41] Dongmei Zhang,et al. Predicting Node failure in cloud service systems , 2018, ESEC/SIGSOFT FSE.
[42] Alexandre Termier,et al. Anomaly Detection in Streams with Extreme Value Theory , 2017, KDD.