Docovery: toward generic automatic document recovery

Application crashes and errors that occur while loading a document are one of the most visible defects of consumer software. While documents become corrupted in various ways---from storage media failures to incompatibility across applications to malicious modifications---the underlying reason they fail to load in a certain application is that their contents cause the application logic to exercise an uncommon execution path which the software was not designed to handle, or which was not properly tested. We present Docovery, a novel document recovery technique based on symbolic execution that makes it possible to fix broken documents without any prior knowledge of the file format. Starting from the code path executed when opening a broken document, Docovery explores alternative paths that avoid the error, and makes small changes to the document in order to force the application to follow one of these alternative paths. We implemented our approach in a prototype tool based on the symbolic execution engine KLEE. We present a preliminary case study, which shows that Docovery can successfully recover broken documents processed by several popular applications such as the e-mail client pine, the pagination tool pr and the binary file utilities dwarfdump and readelf.

[1]  Xuxian Jiang,et al.  AutoPaG: towards automated software patch generation with source code root cause identification and repair , 2007, ASIACCS '07.

[2]  Xin Yao,et al.  A novel co-evolutionary approach to automatic software bug fixing , 2008, 2008 IEEE Congress on Evolutionary Computation (IEEE World Congress on Computational Intelligence).

[3]  Nikolaj Bjørner,et al.  Satisfiability modulo theories , 2011, Commun. ACM.

[4]  Dawson R. Engler,et al.  EXE: automatically generating inputs of death , 2006, CCS '06.

[5]  Koushik Sen,et al.  DART: directed automated random testing , 2005, PLDI '05.

[6]  James C. King,et al.  Symbolic execution and program testing , 1976, CACM.

[7]  Alessandro Orso,et al.  Penumbra: automatically identifying failure-relevant inputs using dynamic tainting , 2009, ISSTA.

[8]  Cristian Cadar,et al.  KATCH: high-coverage testing of software patches , 2013, ESEC/FSE 2013.

[9]  Helen J. Wang,et al.  ShieldGen: Automatic Data Patch Generation for Unknown Vulnerabilities with Informed Probing , 2007, 2007 IEEE Symposium on Security and Privacy (SP '07).

[10]  Vladimir I. Levenshtein,et al.  Binary codes capable of correcting deletions, insertions, and reversals , 1965 .

[11]  Martin C. Rinard,et al.  Taint-based directed whitebox fuzzing , 2009, 2009 IEEE 31st International Conference on Software Engineering.

[12]  Fan Long,et al.  Automatic input rectification , 2012, 2012 34th International Conference on Software Engineering (ICSE).

[13]  Miguel Castro,et al.  Vigilante: end-to-end containment of internet worms , 2005, SOSP '05.

[14]  Vikram S. Adve,et al.  LLVM: a compilation framework for lifelong program analysis & transformation , 2004, International Symposium on Code Generation and Optimization, 2004. CGO 2004..

[15]  Koushik Sen,et al.  CUTE: a concolic unit testing engine for C , 2005, ESEC/FSE-13.

[16]  Koushik Sen,et al.  Symbolic execution for software testing: three decades later , 2013, CACM.

[17]  Nikolai Tillmann,et al.  Pex-White Box Test Generation for .NET , 2008, TAP.

[18]  Martin C. Rinard,et al.  Living in the comfort zone , 2007, OOPSLA.

[19]  Junfeng Yang,et al.  Automatically generating malicious disks using symbolic execution , 2006, 2006 IEEE Symposium on Security and Privacy (S&P'06).

[20]  David Brumley,et al.  AEG: Automatic Exploit Generation , 2011, NDSS.

[21]  Manuel Costa,et al.  Bouncer: securing software by blocking bad input , 2008, WRAITS '08.

[22]  Tzi-cker Chiueh,et al.  Automatic Patch Generation for Buffer Overflow Attacks , 2007, Third International Symposium on Information Assurance and Security.

[23]  Michael D. Ernst,et al.  Automatically patching errors in deployed software , 2009, SOSP '09.

[24]  Christof Fetzer,et al.  Automatically Finding and Patching Bad Error Handling , 2006, 2006 Sixth European Dependable Computing Conference.

[25]  Corina S. Pasareanu,et al.  JPF-SE: A Symbolic Execution Extension to Java PathFinder , 2007, TACAS.

[26]  Claire Le Goues,et al.  Automatically finding patches using genetic programming , 2009, 2009 IEEE 31st International Conference on Software Engineering.

[27]  Dawson R. Engler,et al.  KLEE: Unassisted and Automatic Generation of High-Coverage Tests for Complex Systems Programs , 2008, OSDI.

[28]  Cristian Cadar,et al.  make test-zesti: A symbolic execution solution for improving regression testing , 2012, 2012 34th International Conference on Software Engineering (ICSE).

[29]  Martin C. Rinard,et al.  Automatic detection and repair of errors in data structures , 2003, OOPSLA '03.

[30]  David Brumley,et al.  All You Ever Wanted to Know about Dynamic Taint Analysis and Forward Symbolic Execution (but Might Have Been Afraid to Ask) , 2010, 2010 IEEE Symposium on Security and Privacy.

[31]  Dawei Qi,et al.  SemFix: Program repair via semantic analysis , 2013, 2013 35th International Conference on Software Engineering (ICSE).