Damage Quarantine and Recovery in Data Processing Systems

In this article, we address transparent Damage Quarantine and Recovery (DQR), a very important problem faced today by a large number of mission, life, and/or business-critical applications and information systems that must manage risk, business continuity, and assurance in the presence of severe cyber attacks. Today, these critical applications still have a “good” chance to su?er from a big “hit” from attacks. Due to data sharing, interdependencies, and interoperability, the hit could greatly “amplify” its damage by causing catastrophic cascading effects, which may “force” an application to halt for hours or even days before the application is recovered. In this paper, we ?rst do a thorough discussion on the limitations of traditional fault tolerance and failure recovery techniques in solving the DQR problem. Then we present a systematic review on how the DQR problem is being solved. Finally, we point out some remaining research issues in fully solving the DQR problem.

[1]  David B. Lomet,et al.  MLR: a recovery method for multi-level systems , 1992, SIGMOD '92.

[2]  Roger S. Barga,et al.  Recovery from "bad" user transactions , 2006, SIGMOD Conference.

[3]  Hermann Kopetz,et al.  Fault tolerance, principles and practice , 1990 .

[4]  R. Sekar,et al.  Specification-based anomaly detection: a new approach for detecting network intrusions , 2002, CCS '02.

[5]  A. Elmagarmid Database transaction models for advanced applications , 1992 .

[6]  Wolfgang Graetsch,et al.  Fault tolerance under UNIX , 1989, TOCS.

[7]  Abraham Silberschatz,et al.  A Formal Approach to Recovery by Compensating Transactions , 1990, VLDB.

[8]  Kishor S. Trivedi Probability and Statistics with Reliability, Queuing, and Computer Science Applications , 1984 .

[9]  Michel Banâtre,et al.  Lessons from FTM: An Experiment in Design and Implementation of a Low-Cost Fault-Tolerant System , 1996, IEEE Trans. Reliab..

[10]  Hector Garcia-Molina,et al.  Using semantic knowledge for transaction processing in a distributed database , 1983, TODS.

[11]  Alfred Z. Spector,et al.  Distributed transactions for reliable systems , 1985, SOSP '85.

[12]  Barbara Liskov,et al.  Guardians and actions: linguistic support for robust, distributed programs , 1982, POPL '82.

[13]  Tzi-cker Chiueh,et al.  Design, implementation, and evaluation of repairable file service , 2003, 2003 International Conference on Dependable Systems and Networks, 2003. Proceedings..

[14]  Stephanie Forrest,et al.  A sense of self for Unix processes , 1996, Proceedings 1996 IEEE Symposium on Security and Privacy.

[15]  Dong Xiang,et al.  Information-theoretic measures for anomaly detection , 2001, Proceedings 2001 IEEE Symposium on Security and Privacy. S&P 2001.

[16]  Santosh K. Shrivastava,et al.  The Treatment of Persistent Objects in Arjuna , 1989, Comput. J..

[17]  Ying Wang,et al.  The Design and Implementation of a Self-Healing Database System , 2004, Journal of Intelligent Information Systems.

[18]  Koral Ilgun,et al.  USTAT: a real-time intrusion detection system for UNIX , 1993, Proceedings 1993 IEEE Computer Society Symposium on Research in Security and Privacy.

[19]  Gerhard Weikum,et al.  Multi-level recovery , 1990, PODS.

[20]  Barbara Liskov,et al.  Guardians and Actions: Linguistic Support for Robust, Distributed Programs , 1983, TOPL.

[21]  Karl N. Levitt,et al.  Execution monitoring of security-critical programs in distributed systems: a specification-based approach , 1997, Proceedings. 1997 IEEE Symposium on Security and Privacy (Cat. No.97CB36097).

[22]  Kenneth P. Birman,et al.  The ISIS Project: Real Experience with a Fault Tolerant Programming System , 1991, ACM SIGOPS Oper. Syst. Rev..

[23]  Santosh K. Shrivastava,et al.  Reliable Computer Systems , 1985, Texts and Monographs in Computer Science.

[24]  Sushil Jajodia,et al.  Recovery from Malicious Transactions , 2002, IEEE Trans. Knowl. Data Eng..

[25]  Hamid Pirahesh,et al.  Efficient and flexible methods for transient versioning of records to avoid locking by read-only transactions , 1992, SIGMOD '92.

[26]  Sushil Jajodia,et al.  Using Checksums to Detect Data Corruption , 2000, EDBT.

[27]  T. Chiueh,et al.  Design, Implementation, and Evaluation of a Repairable Database Management System , 2005, ICDE.

[28]  Harold S. Javitz,et al.  The SRI IDES statistical anomaly detector , 1991, Proceedings. 1991 IEEE Computer Society Symposium on Research in Security and Privacy.

[29]  Richard A. Crus Data Recovery in IBM Database 2 , 1984, IBM Syst. J..

[30]  LinJun-Lin,et al.  A Low-Cost Checkpointing Technique for Distributed Databases , 2001 .

[31]  Peng Liu,et al.  Self-healing workflow systems under attacks , 2004, 24th International Conference on Distributed Computing Systems, 2004. Proceedings..

[32]  Barbara Liskov,et al.  Implementation of Argus , 1987, SOSP '87.

[33]  Yi-Bing Lin,et al.  A study of time warp rollback mechanisms , 1991, TOMC.

[34]  Calton Pu On-the-fly, incremental, consistent reading of entire databases , 2005, Algorithmica.

[35]  L. Alvisi,et al.  A Survey of Rollback-Recovery Protocols , 2002 .

[36]  Michael Gertz,et al.  DEMIDS: A Misuse Detection System for Database Systems , 2000, IICIS.

[37]  Salvatore J. Stolfo,et al.  AI Approaches to Fraud Detection and Risk Management , 1998, AI Mag..

[38]  Mario A. Nascimento,et al.  A Survey of Distributed Database Checkpointing , 1997, Distributed and Parallel Databases.

[39]  Partha Dasgupta,et al.  The Clouds distributed operating system: functional description, implementation details and related work , 1988, [1988] Proceedings. The 8th International Conference on Distributed.

[40]  Sushil Jajodia,et al.  Multi-phase damage confinement in database systems for intrusion tolerance , 2001, Proceedings. 14th IEEE Computer Security Foundations Workshop, 2001..

[41]  Brian N. Bershad,et al.  Improving the reliability of commodity operating systems , 2005, TOCS.

[42]  Hamid Pirahesh,et al.  ARIES: a transaction recovery method supporting fine-granularity locking and partial rollbacks using write-ahead logging , 1998 .

[43]  Noah Treuhaft,et al.  Recovery Oriented Computing (ROC): Motivation, Definition, Techniques, and Case Studies , 2002 .

[44]  Peng Liu,et al.  Real-time data attack isolation for commercial database applications , 2006, J. Netw. Comput. Appl..

[45]  Vern Paxson,et al.  Bro: a system for detecting network intruders in real-time , 1998, Comput. Networks.

[46]  Sushil Jajodia,et al.  Rewriting Histories: Recovering from Malicious Transactions , 2004, Distributed and Parallel Databases.

[47]  Kenneth P. Birman,et al.  Reliable Distributed Systems: Technologies, Web Services, and Applications , 2005 .

[48]  Andreas Reuter,et al.  Transaction Processing: Concepts and Techniques , 1992 .

[49]  Dennis Shasha,et al.  The dangers of replication and a solution , 1996, SIGMOD '96.

[50]  Margo I. Seltzer,et al.  Dealing with disaster: surviving misbehaved kernel extensions , 1996, OSDI '96.

[51]  Sushil Jajodia,et al.  On-The-Fly Reading of Entire Databases , 1995, IEEE Trans. Knowl. Data Eng..

[52]  Jörg Kaiser,et al.  Providing Recoverability in a Transaction Oriented Distributed Operating System , 1986, ICDCS.

[53]  Karsten Schwan,et al.  CHAOS/sup art/: support for real-time atomic transactions , 1989, [1989] The Nineteenth International Symposium on Fault-Tolerant Computing. Digest of Papers.

[54]  Roger L. Haskin,et al.  Recovery management in QuickSilver , 1988, TOCS.

[55]  Elisa Bertino,et al.  Intrusion detection in RBAC-administered databases , 2005, 21st Annual Computer Security Applications Conference (ACSAC'05).

[56]  Rodrigo Rodrigues,et al.  Transactional file systems can be fast , 2004, EW 11.

[57]  Sushil Jajodia,et al.  Intrusion Confinement by Isolation in Information Systems , 2000, J. Comput. Secur..

[58]  Brian E. Clark,et al.  Application System/400 Performance Characteristics , 1989, IBM Syst. J..

[59]  David R. Jefferson,et al.  Virtual time , 1985, ICPP.

[60]  Sushil Jajodia,et al.  Surviving information warfare attacks on databases , 1997, Proceedings. 1997 IEEE Symposium on Security and Privacy (Cat. No.97CB36097).

[61]  Brajendra Panda,et al.  Reconstructing the Database after Electronic Attacks , 1998, DBSec.

[62]  Peng Liu,et al.  Modeling and Evaluating the Survivability of an Intrusion Tolerant Database System , 2006, ESORICS.

[63]  Fred B. Schneider,et al.  Implementing fault-tolerant services using the state machine approach: a tutorial , 1990, CSUR.

[64]  George Candea,et al.  Recursive restartability: turning the reboot sledgehammer into a scalpel , 2001, Proceedings Eighth Workshop on Hot Topics in Operating Systems.

[65]  Daniel P. Siewiorek,et al.  Reliable Computer Systems: Design and Evaluation, Third Edition , 1998 .

[66]  Eyal de Lara,et al.  The taser intrusion recovery system , 2005, SOSP '05.

[67]  Bruce Schneier,et al.  Attack Trends: 2004 and 2005 , 2005, ACM Queue.

[68]  Kenneth P. Birman,et al.  The ISIS project: real experience with a fault tolerant programming system , 1990, EW 4.

[69]  LinJun-Lin,et al.  A Survey of Distributed Database Checkpointing , 1997 .

[70]  Radek Vingralek,et al.  How to build a trusted database system on untrusted storage , 2000, OSDI.

[71]  Iván Arce Attack Trends , .

[72]  Jun-Lin Lin,et al.  A Low-Cost Checkpointing Technique for Distributed Databases , 2001, Distributed and Parallel Databases.

[73]  John P. McDermott,et al.  Towards a model of storage jamming , 1996, Proceedings 9th IEEE Computer Security Foundations Workshop.

[74]  Robert S. Swarz,et al.  Reliable Computer Systems: Design and Evaluation , 1992 .

[75]  Jim Gray,et al.  Benchmark Handbook: For Database and Transaction Processing Systems , 1992 .

[76]  Salvatore J. Stolfo,et al.  Credit Card Fraud Detection Using Meta-Learning: Issues and Initial Results 1 , 1997 .