Using System Profiling for Effective Degradation Detection

Many computing systems suffer from degradation that becomes present over time and affects the value of the system. For an autonomic computing (AC) system to self-optimize and self-heal, rapid detection of degradation and identifying its source must be performed during execution, even when the degradation appears slowly over time. A common technique is to employ a MAPE-K (monitor-analyze-plan-execute plus knowledge) loop. Monitoring system degradation means the system must have an embedded representation of its expected performance over time to detect a change at runtime and determine how the degradation is propagating throughout the system. With this information, an appropriate adaption can be executed to address potential system failures. However, there are many industrial applications for which incorporating runtime monitoring and analysis requires studying historical data captures and relating them to prior system and process performance and degradation to construct the appropriate reference architectures. In addition, the monitoring process can itself degrade and report inaccurate performance outcomes. In this paper, we present an industrial application where the performance of system processes is dependent on the integrity of shared system components, each process is monitored by an independent, local monitoring process, and these monitors are subject to other independent sources of degradation. We demonstrate that using the Kolmogorov-Smirnov (KS) test to analyze process performance can detect degradation. Using Pearson correlations with other processes in the system and application experimentation, we show that it can be determined if the system processes or monitor processes are affected and if there is a system-wide failure.