Automatic input rectification

We present a novel technique, automatic input rectification, and a prototype implementation, SOAP. SOAP learns a set of constraints characterizing typical inputs that an application is highly likely to process correctly. When given an atypical input that does not satisfy these constraints, SOAP automatically rectifies the input (i.e., changes the input so that it satisfies the learned constraints). The goal is to automatically convert potentially dangerous inputs into typical inputs that the program is highly likely to process correctly. Our experimental results show that, for a set of benchmark applications (Google Picasa, ImageMagick, VLC, Swfdec, and Dillo), this approach effectively converts malicious inputs (which successfully exploit vulnerabilities in the application) into benign inputs that the application processes correctly. Moreover, a manual code analysis shows that, if an input does satisfy the learned constraints, it is incapable of exploiting these vulnerabilities. We also present the results of a user study designed to evaluate the subjective perceptual quality of outputs from benign but atypical inputs that have been automatically rectified by SOAP to conform to the learned constraints. Specifically, we obtained benign inputs that violate learned constraints, used our input rectifier to obtain rectified inputs, then paid Amazon Mechanical Turk users to provide their subjective qualitative perception of the difference between the outputs from the original and rectified inputs. The results indicate that rectification can often preserve much, and in many cases all, of the desirable data in the original input.

[1]  Mona Attariyan,et al.  Automating Configuration Troubleshooting with Dynamic Information Flow Analysis , 2010, OSDI.

[2]  David Brumley,et al.  Vulnerability-Specific Execution Filtering for Exploit Prevention on Commodity Software , 2006, NDSS.

[3]  Helen J. Wang,et al.  Tupni: automatic reverse engineering of input formats , 2008, CCS.

[4]  Christopher Krügel,et al.  Anomalous system call detection , 2006, TSEC.

[5]  Jun Xu,et al.  Packet vaccine: black-box exploit detection and signature generation , 2006, CCS '06.

[6]  Christopher Krügel,et al.  Anomaly detection of web-based attacks , 2003, CCS '03.

[7]  Elisabeth Buffard,et al.  VLC Media Player , 2012 .

[8]  Miguel Castro,et al.  Vigilante: end-to-end containment of internet worms , 2005, SOSP '05.

[9]  Salvatore J. Stolfo,et al.  Anomalous Payload-Based Network Intrusion Detection , 2004, RAID.

[10]  Stephen McCamant,et al.  Inference and enforcement of data structure consistency specifications , 2006, ISSTA '06.

[11]  Michael D. Ernst,et al.  Automatically patching errors in deployed software , 2009, SOSP '09.

[12]  Mark Sanderson,et al.  Do user preferences and evaluation measures line up? , 2010, SIGIR.

[13]  Christopher Krügel,et al.  Using Generalization and Characterization Techniques in the Anomaly-based Detection of Web Attacks , 2006, NDSS.

[14]  Martin C. Rinard,et al.  Taint-based directed whitebox fuzzing , 2009, 2009 IEEE 31st International Conference on Software Engineering.

[15]  Richard M. Karp,et al.  Reducibility Among Combinatorial Problems , 1972, 50 Years of Integer Programming.

[16]  Sarfraz Khurshid,et al.  Juzi: a tool for repairing complex data structures , 2008, ICSE.

[17]  Paul Ammann,et al.  Data Diversity: An Approach to Software Fault Tolerance , 1988, IEEE Trans. Computers.

[18]  Alessandro Orso,et al.  Dytan: a generic dynamic taint analysis framework , 2007, ISSTA '07.

[19]  Nicholas Nethercote,et al.  Valgrind: a framework for heavyweight dynamic binary instrumentation , 2007, PLDI '07.

[20]  Aniket Kittur,et al.  Crowdsourcing user studies with Mechanical Turk , 2008, CHI.

[21]  Helen J. Wang,et al.  ShieldGen: Automatic Data Patch Generation for Unknown Vulnerabilities with Informed Probing , 2007, 2007 IEEE Symposium on Security and Privacy (SP '07).

[22]  Christoph Csallner,et al.  Dynamic symbolic data structure repair , 2010, 2010 ACM/IEEE 32nd International Conference on Software Engineering.

[23]  Xiangyu Zhang,et al.  Deriving input syntactic structure from execution , 2008, SIGSOFT '08/FSE-16.

[24]  Nick Feamster,et al.  Behavioral Clustering of HTTP-Based Malware and Signature Generation Using Malicious Network Traces , 2010, NSDI.

[25]  Christopher Krügel,et al.  Automatic Network Protocol Analysis , 2008, NDSS.

[26]  Martin C. Rinard,et al.  Living in the comfort zone , 2007, OOPSLA.

[27]  Larry L. Peterson,et al.  binpac: a yacc for writing application protocol parsers , 2006, IMC '06.

[28]  Debin Gao,et al.  On Gray-Box Program Tracking for Anomaly Detection , 2004, USENIX Security Symposium.

[29]  Jeffrey Heer,et al.  Crowdsourcing graphical perception: using mechanical turk to assess visualization design , 2010, CHI.

[30]  Guofei Gu,et al.  TaintScope: A Checksum-Aware Directed Fuzzing Tool for Automatic Software Vulnerability Detection , 2010, 2010 IEEE Symposium on Security and Privacy.

[31]  Martin C. Rinard,et al.  Automatically identifying critical input regions and code in applications , 2010, ISSTA '10.

[32]  Zhenkai Liang,et al.  Polyglot: automatic extraction of protocol message format using dynamic binary analysis , 2007, CCS '07.

[33]  Brendan T. O'Connor,et al.  Cheap and Fast – But is it Good? Evaluating Non-Expert Annotations for Natural Language Tasks , 2008, EMNLP.

[34]  Giovanni Vigna,et al.  A Learning-Based Approach to the Detection of SQL Attacks , 2005, DIMVA.

[35]  Siddharth Suri,et al.  Conducting behavioral research on Amazon’s Mechanical Turk , 2010, Behavior research methods.

[36]  David G. Rand,et al.  The promise of Mechanical Turk: how online labor markets can help theorists run behavioral experiments. , 2012, Journal of theoretical biology.

[37]  Manuel Costa,et al.  Bouncer: securing software by blocking bad input , 2008, WRAITS '08.

[38]  Brian Demsky Data structure repair using goal-directed reasoning , 2005, Proceedings. 27th International Conference on Software Engineering, 2005. ICSE 2005..

[39]  Alexander I. Rudnicky,et al.  Using the Amazon Mechanical Turk for transcription of spoken language , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.