A survey on online monitoring approaches of computer-based systems

This report surveys forms of online data collection that are in current use (as well as being the subject of research to adapt them to changing technology and demands), and can be used as inputs to assessment of dependability and resilience, although they are not primarily meant for this use.

[1]  Robbert van Renesse,et al.  Astrolabe: A robust and scalable technology for distributed system monitoring, management, and data mining , 2003, TOCS.

[2]  Asser N. Tantawi,et al.  Performance management for cluster-based web services , 2005, IEEE Journal on Selected Areas in Communications.

[3]  W. W. Chandler The Installation and Maintenance of Colossus , 1983, Annals of the History of Computing.

[4]  Mark S. Squillante,et al.  Failure data analysis of a large-scale heterogeneous server environment , 2004, International Conference on Dependable Systems and Networks, 2004.

[5]  Leslie Lamport,et al.  Distributed snapshots: determining global states of distributed systems , 1985, TOCS.

[6]  Alessandro Cilardo,et al.  Adaptable Parsing of Real-Time Data Streams , 2007, 15th EUROMICRO International Conference on Parallel, Distributed and Network-Based Processing (PDP'07).

[7]  Donal Heffernan,et al.  Runtime verification and monitoring of embedded systems , 2007, IET Softw..

[8]  John C. Reynolds,et al.  School of Computer Science , 1992 .

[9]  Morris Sloman,et al.  Monitoring Distributed Systems (A Survey) , 1992 .

[10]  Paola Inverardi,et al.  A Framework for Reconfiguration-Based Fault-Tolerance in Distributed Systems , 2003, WADS.

[11]  Yan Gao,et al.  A DoS Resilient Flow-level Intrusion Detection Approach for High-speed Networks , 2006, 26th IEEE International Conference on Distributed Computing Systems (ICDCS'06).

[12]  Robert S. Swarz,et al.  Reliable Computer Systems: Design and Evaluation , 1992 .

[13]  J.M. Ayache,et al.  OBSERVER A CONCEPT FOR ON-LINE DETECTION OF CONTROL ERRORS IN CONCURRENT SYSTEMS , 1995, Twenty-Fifth International Symposium on Fault-Tolerant Computing, 1995, ' Highlights from Twenty-Five Years'..

[14]  Chen-Khong Tham,et al.  Challenges and approaches in providing QoS monitoring , 2000, Int. J. Netw. Manag..

[15]  Song Xue,et al.  Reliability Assessment of Mass-Market Software: Insights from Windows Vista® , 2008, 2008 19th International Symposium on Software Reliability Engineering (ISSRE).

[16]  Ludmila Cherkasova,et al.  XenMon: QoS Monitoring and Performance Profiling Tool , 2005 .

[17]  Hari Balakrishnan,et al.  Tolerating byzantine faults in transaction processing systems using commit barrier scheduling , 2007, SOSP.

[18]  Daniel P. Siewiorek,et al.  High-availability computer systems , 1991, Computer.

[19]  Sam Toueg,et al.  Unreliable failure detectors for reliable distributed systems , 1996, JACM.

[20]  Leslie Lamport,et al.  Time, clocks, and the ordering of events in a distributed system , 1978, CACM.

[21]  Wolfgang Emmerich,et al.  Efficient online monitoring of web-service SLAs , 2008, SIGSOFT '08/FSE-16.

[22]  Bernd Finkbeiner,et al.  Checking Finite Traces Using Alternating Automata , 2004, Formal Methods Syst. Des..

[23]  Zibin Zheng,et al.  WS-DREAM: A distributed reliability assessment Mechanism for Web Services , 2008, 2008 IEEE International Conference on Dependable Systems and Networks With FTCS and DCC (DSN).

[24]  Alexander Romanovsky,et al.  Measuring the Dependability of Web Services for Use in e-Science Experiments , 2006, ISAS.

[25]  Heiko Ludwig,et al.  The WSLA Framework: Specifying and Monitoring Service Level Agreements for Web Services , 2003, Journal of Network and Systems Management.

[26]  Miguel Correia,et al.  Resilient Intrusion Tolerance through Proactive and Reactive Recovery , 2007, 13th Pacific Rim International Symposium on Dependable Computing (PRDC 2007).

[27]  Mario R. Garzia,et al.  Assessing End-User Reliability Prior To Product Ship , 2007 .

[28]  Alex Aiken,et al.  Cooperative Bug Isolation , 2007 .

[29]  Gail E. Kaiser,et al.  An Approach to Autonomizing Legacy Systems , 2002 .

[30]  Fabio Casati,et al.  Automated SLA Monitoring for Web Services , 2002, DSOM.

[31]  Karen A. Scarfone,et al.  Guide to Intrusion Detection and Prevention Systems (IDPS) , 2007 .

[32]  I. Monitor Information Security Management Handbook , 2000 .

[33]  Brendan Murphy Automating Software Failure Reporting , 2004, ACM Queue.

[34]  Flaviu Cristian,et al.  Probabilistic clock synchronization , 1989, Distributed Computing.

[35]  Gregor Kiczales,et al.  Aspect-oriented programming , 2001, ESEC/FSE-9.

[36]  Lorenzo Falai,et al.  Observing, Monitoring and Evaluating Distributed Systems , 2007 .

[37]  Robin Berthier,et al.  A Statistical Analysis of Attack Data to Separate Attacks , 2006, International Conference on Dependable Systems and Networks (DSN'06).

[38]  Liljana Gavrilovska,et al.  Implementation of accounting model within SNMPv3 architecture , 2001, Proceedings. Ninth IEEE International Conference on Networks, ICON 2001..

[39]  Ravishankar K. Iyer,et al.  Experimental evaluation , 1995 .

[40]  C. Fetzer Automatic Collection of Failure Traces , 2007 .

[41]  Ann Q. Gates,et al.  A taxonomy and catalog of runtime software-fault monitoring tools , 2004, IEEE Transactions on Software Engineering.

[42]  L. McLaughlin Automated bug tracking: the promise and the pitfalls , 2004, IEEE Software.

[43]  Jeffrey M. Voas,et al.  Deriving Accurate Operational Profiles for Mass-Marketed Software , 2000 .

[44]  Matti A. Hiltunen Membership and system diagnosis , 1995, Proceedings. 14th Symposium on Reliable Distributed Systems.

[45]  Yuanyuan Zhou,et al.  Rx: treating bugs as allergies---a safe method to survive software failures , 2005, SOSP '05.

[46]  Beth A. Schroeder On-Line Monitoring: A Tutorial , 1995, Computer.

[47]  P. Pandurang Nayak,et al.  A Model-Based Approach to Reactive Self-Configuring Systems , 1996, AAAI/IAAI, Vol. 2.

[48]  Brian C. Williams,et al.  Model-based programming of intelligent embedded systems and robotic space explorers , 2003, Proc. IEEE.

[49]  Pierre Wolper,et al.  Simple on-the-fly automatic verification of linear temporal logic , 1995, PSTV.

[50]  Guy Juanole,et al.  Observer-A Concept for Formal On-Line Validation of Distributed Systems , 1994, IEEE Trans. Software Eng..

[51]  Daniel P. Siewiorek,et al.  GENERAL-PURPOSE COMPUTING , 1992 .

[52]  David L. Cohn,et al.  Autonomic Computing , 2003, ISADS.

[53]  Heiko Ludwig,et al.  Web Service Level Agreement (WSLA) Language Specification , 2003 .

[54]  Konrad Slind,et al.  Monitoring distributed systems , 1987, TOCS.

[55]  Lorenzo Strigini,et al.  Fault Tolerance via Diversity for Off-the-Shelf Products: A Study with SQL Database Servers , 2007, IEEE Transactions on Dependable and Secure Computing.

[56]  Grigore Rosu,et al.  Monitoring Java Programs with Java PathExplorer , 2001, RV@CAV.

[57]  Paul A. S. Ward,et al.  ADAPTIVE MONITORING IN ENTERPRISE SOFTWARE SYSTEMS , 2006 .