Online Anomaly Symptom Detection and Process's Resource Usage Control

In this paper we propose an online lightweight anomaly symptom detection and process’s resource usage control mechanism. Our system collects fine-grain resource information that can reflect the subtle changes of the application’sbehavior. Then it creates models with a learning-based algorithm without manual configurations. If an anomaly symptom is detected, the automatic procedure will start. The system will control the suspected application’s resource use by limiting the upper bound resource of the process. The method will make the application yield its CPU to the administrative inspection. In this paper, we described whole architecture of the system and evaluate it with the non deterministic and deterministic failure. Our experimental results indicate that our prototype system is able to detect non deterministic failure with high precision in anomaly training and control it’s resource use with an overhead of about 1%.

[1]  W. Kent Fuchs,et al.  An adaptive checkpointing protocol to bound recovery time with message logging , 1999, Proceedings of the 18th IEEE Symposium on Reliable Distributed Systems.

[2]  Armando Fox,et al.  Ensembles of models for automated diagnosis of system performance problems , 2005, 2005 International Conference on Dependable Systems and Networks (DSN'05).

[3]  Victor R. Basili,et al.  Software errors and complexity: an empirical investigation , 1993 .

[4]  Victor R. Basili,et al.  Software errors and complexity: an empirical investigation0 , 1984, CACM.

[5]  Satish Narayanasamy,et al.  Deterministic replay using processor support and its applications , 2007 .

[6]  Miroslaw Malek,et al.  Using Hidden Semi-Markov Models for Effective Online Failure Prediction , 2007, 2007 26th IEEE International Symposium on Reliable Distributed Systems (SRDS 2007).

[7]  George Candea,et al.  Microreboot - A Technique for Cheap Recovery , 2004, OSDI.

[8]  Midori Sugaya,et al.  Lightweight anomaly detection system with HMM resource modeling , 2009 .

[9]  Daniel P. Siewiorek,et al.  Error log analysis: statistical modeling and heuristic trend analysis , 1990 .

[10]  Andreas Reuter,et al.  Transaction Processing: Concepts and Techniques , 1992 .

[11]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[12]  Ytzhak H. Levendel Reliability Analysis of Large Software Systems: Defect Data Modeling , 1990, IEEE Trans. Software Eng..

[13]  M. Desnoyers,et al.  The LTTng tracer: A low impact performance and behavior monitor for GNU/Linux , 2006 .

[14]  Ram Chillarege,et al.  Defect type and its impact on the growth curve (software development) , 1991, [1991 Proceedings] 13th International Conference on Software Engineering.

[15]  Stefan Savage,et al.  Processor capacity reserves: operating system support for multimedia applications , 1994, 1994 Proceedings of IEEE International Conference on Multimedia Computing and Systems.

[16]  George Candea,et al.  Recovery-oriented computing: building multitier dependability , 2004, Computer.

[17]  Stefan Savage,et al.  Processor Capacity Reserves for Multimedia Operating Systems , 1993 .

[18]  L. Baum,et al.  Statistical Inference for Probabilistic Functions of Finite State Markov Chains , 1966 .

[19]  Midori Sugaya,et al.  Accounting system: a fine-grained CPU resource protection mechanism for embedded system , 2006, Ninth IEEE International Symposium on Object and Component-Oriented Real-Time Distributed Computing (ISORC'06).

[20]  Yennun Huang,et al.  Software rejuvenation: analysis, module and applications , 1995, Twenty-Fifth International Symposium on Fault-Tolerant Computing. Digest of Papers.