An Early Risk Detection and Management System for the Cloud with Log Parser

Abstract In a software-defined data center (SDDC), detecting potential risks from structured or unstructured data in real-time is challenging and proves to be of vital significance to deliver zero downtime services. It is essential for the cloud administrators to be aware of any operation or sequence of operations that could cause critical failures resulting in the loss of Business/Mission Critical Systems (BCS/MCS). This demands for a solution such as an Early Risk Detection and Management System (ERDMS). ERDMS provides insights on the operations that can put the system in peril, and recommends the suitable steps to reduce or eliminate the risks involved. In this work, we present our implementation of an Early Risk Detection and Management System (ERDMS) for the cloud using data analytics, association rule learning and machine learning techniques. The ERDMS continually monitors various system parameters by processing the sequence of operations performed on the system to detect potential risk(s) and recommend the probable solution(s). Initially, it parses the log bundles to learn rules, known as “association rules”, for risk detection using apriori algorithm. Each of these association rules consists of premise − sequence of operations and inference − potential risk(s). While constantly monitoring a system, if ERDMS detects a pattern it has learnt, it classifies the pattern into a set of potential risk(s) using decision tree algorithm. It computes a probability for each potential risk to gauge the impact; it also generates a summary of its learning. Once the potential risk(s) is detected, it searches the relevant sites to recommend a set of probable solution(s). Furthermore, it offers “auto-resolution”, where the recommended steps are automatically executed. Consequently, upon appropriate action, the system might not encounter the issue and will continue to work seamlessly.