ExecRecorder: VM-based full-system replay for attack analysis and system recovery

Log-based recovery and replay systems are important for system reliability, debugging and postmortem analysis/recovery of malware attacks. These systems must incur low space and performance overhead, provide full-system replay capabilities, and be resilient against attacks. Previous approaches fail to meet these requirements: they replay only a single process, or require changes in the host and guest OS, or do not have a fully-implemented replay component. This paper studies full-system replay for uniprocessors by logging and replaying architectural events. To limit the amount of logged information, we identify architectural nondeterministic events, and encode them compactly. Here we present ExecRecorder, a full-system, VM-based, log and replay framework for post-attack analysis and recovery. ExecRecorder can replay the execution of an entire system by checkpointing the system state and logging architectural nondeterministic events, and imposes low performance overhead (less than 4% on average). In our evaluation its log files grow at about 5.4 GB/hour (arithmetic mean). Thus it is practical to log on the order of hours or days between checkpoints. It can also be integrated naturally with an IDS and a post-attack analysis tool for intrusion analysis and recovery.

[1]  Brian D. Noble,et al.  When Virtual Is Better Than Real , 2001 .

[2]  Zhendong Su,et al.  On deriving unknown vulnerabilities from zero-day polymorphic and metamorphic worm exploits , 2005, CCS '05.

[3]  Samuel T. King,et al.  Detecting past and present intrusions through vulnerability-specific predicates , 2005, SOSP '05.

[4]  Thomas J. LeBlanc,et al.  Debugging Parallel Programs with Instant Replay , 1987, IEEE Transactions on Computers.

[5]  Satish Narayanasamy,et al.  BugNet: continuously recording program execution for deterministic replay debugging , 2005, 32nd International Symposium on Computer Architecture (ISCA'05).

[6]  Josep Torrellas,et al.  ReEnact: using thread-level speculation mechanisms to debug data races in multithreaded codes , 2003, ISCA '03.

[7]  Tal Garfinkel,et al.  When Virtual Is Harder than Real: Security Challenges in Virtual Machine Based Computing Environments , 2005, HotOS.

[8]  Tal Garfinkel,et al.  Virtual machine monitors: current technology and future trends , 2005, Computer.

[9]  Marianne Shaw,et al.  Rethinking the design of virtual machine monitors , 2005, Computer.

[10]  Peter M. Chen,et al.  Discount Checking: Transparent, Low-Overhead Recovery for General Applications , 1998 .

[11]  James E. Smith,et al.  Virtual machines - versatile platforms for systems and processes , 2005 .

[12]  Satish Narayanasamy,et al.  BugNet: Continuously Recording Program Execution for Deterministic Replay Debugging , 2005, ISCA 2005.

[13]  Yuanyuan Zhou,et al.  Rx: treating bugs as allergies---a safe method to survive software failures , 2005, SOSP '05.

[14]  Samuel T. King,et al.  Operating System Support for Virtual Machines , 2003, USENIX Annual Technical Conference, General Track.

[15]  Lorenzo Alvisi Understanding the message logging paradigm for masking process crashes , 1996 .

[16]  Srikanth Kandula,et al.  Flashback: A Lightweight Extension for Rollback and Deterministic Replay for Software Debugging , 2004, USENIX Annual Technical Conference, General Track.

[17]  Intel Corportation,et al.  IA-32 Intel Architecture Software Developers Manual , 2004 .

[18]  Samuel T. King,et al.  ReVirt: enabling intrusion analysis through virtual-machine logging and replay , 2002, OPSR.

[19]  E. N. Elnozahy,et al.  Supporting nondeterministic execution in fault-tolerant systems , 1996, Proceedings of Annual Symposium on Fault Tolerant Computing.

[20]  Robert Love,et al.  Linux Kernel Development , 2003 .

[21]  E. N. Elnozahy,et al.  Support for Software Interrupts in Log-Based Rollback-Recovery , 1998, IEEE Trans. Computers.

[22]  Frederic T. Chong,et al.  Minos: Control Data Attack Prevention Orthogonal to Memory Model , 2004, 37th International Symposium on Microarchitecture (MICRO-37'04).

[23]  Fred B. Schneider,et al.  Hypervisor-based fault tolerance , 1996, TOCS.

[24]  Jong-Deok Choi,et al.  Deterministic replay of Java multithreaded applications , 1998, SPDT '98.

[25]  L. Alvisi,et al.  A Survey of Rollback-Recovery Protocols , 2002 .

[26]  Lorenzo Alvisi,et al.  Reasons for a pessimistic or optimistic message logging protocol in MPI uncoordinated failure, recovery , 2009, 2009 IEEE International Conference on Cluster Computing and Workshops.

[27]  Min Xu,et al.  A "flight data recorder" for enabling full-system multiprocessor deterministic replay , 2003, ISCA '03.