IT Incident Management by Analyzing Incident Relations

IT incident management aims to maintain high levels of service quality and availability by restoring normal service operations as quickly as possible and minimizing business impact. Enterprises often maintain many applications to support their business. It is a significant challenge to diagnose incidents at application level due to complicated causes often aggregated from the shared IT environment, network, hardware, software, and changes. In this paper, we present a new approach to diagnosing application incidents by effectively searching for relevant co-occurring and reoccurring incidents. These relevant incidents reveal patterns of application failures and provide insights into incident resolution and prevention. This paper also provides a case study where we implement this approach and evaluate its performance in terms of search accuracy.