AIOps: Real-World Challenges and Research Innovations

AIOps is about empowering software and service engineers (e.g., developers, program managers, support engineers, site reliability engineers) to efficiently and effectively build and operate online services and Apps at scale with artificial intelligence (AI) and machine learning (ML) techniques. AIOps can help achieve higher service quality and customer satisfaction, engineering productivity boost, and cost reduction. In this technical briefing, we summarize the real-world challenges on building AIOps solutions based on our practice and experience in Microsoft, propose a roadmap of AIOps related research directions, and share a few successful AIOps solutions we have built for Microsoft service products.

[1]  Dongmei Zhang,et al.  Predicting Node failure in cloud service systems , 2018, ESEC/SIGSOFT FSE.

[2]  Peng Li,et al.  Improving Service Availability of Cloud Systems by Predicting Disk Error , 2018, USENIX Annual Technical Conference.

[3]  Dongmei Zhang,et al.  iDice: Problem Identification for Emerging Issues , 2016, 2016 IEEE/ACM 38th International Conference on Software Engineering (ICSE).

[4]  Dongmei Zhang,et al.  Software Analytics in Practice , 2013, IEEE Software.

[5]  Peng Huang,et al.  13th USENIX Symposium on Operating Systems Design and Implementation, OSDI 2018, Carlsbad, CA, USA, October 8-10, 2018 , 2018, OSDI.