A Flexible Architecture for Statistical Learning and Data Mining from System Log Streams

Modern computer systems are instrumented to generate huge amounts of system log data. This data contains valuable information for managing the system, localizing failures, and recovery. However, the complexity of these systems greatly surpasses what can be understood by human operators and thus automated analysis systems are beginning to be used. Due to preprocessing required by the statistical algorithms, the extremely high volume of data cannot be processed using ad-hoc scripts. We present a flexible, modular and scalable architecture for statistical learning from large data streams that can easily process lots of data. We built a prototype that is evaluated using system log data from a commercial on-line service. Moreover, the results of the analysis were genuinely useful for the on-line service operators.

[1]  Raghu Ramakrishnan,et al.  Database Management Systems , 1976 .

[2]  Jeffrey D. Case,et al.  Simple Network Management Protocol (SNMP) , 1990, RFC.

[3]  Sriram Raghavan,et al.  Searching the Web , 2001, ACM Trans. Internet Techn..

[4]  David R. Karger,et al.  Chord: A scalable peer-to-peer lookup service for internet applications , 2001, SIGCOMM '01.

[5]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques with Java implementations , 2002, SGMD.

[6]  Ricardo Vilalta,et al.  Predicting rare events in temporal domains , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..

[7]  Samuel Madden,et al.  Continuously adaptive continuous queries over streams , 2002, SIGMOD '02.

[8]  Johannes Gehrke,et al.  Querying and mining data streams: you only get one look a tutorial , 2002, SIGMOD '02.

[9]  David A. Patterson,et al.  A Simple Way to Estimate the Cost of Downtime , 2002, LISA.

[10]  Eric A. Brewer,et al.  Pinpoint: problem determination in large, dynamic Internet services , 2002, Proceedings International Conference on Dependable Systems and Networks.

[11]  Joseph L. Hellerstein,et al.  Predictive algorithms in the management of computer systems , 2002, IBM Syst. J..

[12]  Jennifer Widom,et al.  Models and issues in data stream systems , 2002, PODS.

[13]  Anand Sivasubramaniam,et al.  Critical event prediction for proactive management in large-scale computer clusters , 2003, KDD '03.

[14]  Michael I. Jordan,et al.  Bug isolation via remote program sampling , 2003, PLDI.

[15]  Robbert van Renesse,et al.  Astrolabe: A robust and scalable technology for distributed system monitoring, management, and data mining , 2003, TOCS.

[16]  Wei Hong,et al.  The design of an acquisitional query processor for sensor networks , 2003, SIGMOD '03.

[17]  Charu C. Aggarwal,et al.  A framework for diagnosing changes in evolving data streams , 2003, SIGMOD '03.

[18]  Frederick Reiss,et al.  TelegraphCQ: An Architectural Status Report , 2003, IEEE Data Eng. Bull..

[19]  Archana Ganapathi,et al.  Why Do Internet Services Fail, and What Can Be Done About It? , 2002, USENIX Symposium on Internet Technologies and Systems.

[20]  Jeffrey S. Chase,et al.  Correlating Instrumentation Data to System States: A Building Block for Automated Diagnosis and Control , 2004, OSDI.

[21]  G. Weikum Querying the Internet with PIER , 2005 .

[22]  A. Fox,et al.  Detecting and Localizing Anomalous Behavior to Discover Failures in Component-Based Internet Services , 2022 .