Discovering specification violations in networked software systems

Publicly released software implementations of network protocols often have bugs that arise from latent specification violations. We present Ape, a technique that explores program behavior to identify potential specification violations. Ape overcomes the challenge of exploring the large space of behavior by dynamically inferring precise models of behavior, stimulating unobserved behavior likely to lead to violations, and refining the behavioral models with the new, stimulated behavior. Ape can (1) discover new specification violations, (2) verify that violations are removed, (3) identify related violations in other versions and implementations of the protocols, and (4) generate tests. Ape works on binaries and requires a lightweight description of the protocol's network messages and a violation characteristic. We use Ape to rediscover the known heartbleed bug in OpenSSL, and discover one unknown bug and two unexpected uses of three popular BitTorrent clients. Manual inspection of Ape-produced artifacts reveals four additional, previously unknown specification violations in OpenSSL and μTorrent.

[1]  Monica S. Lam,et al.  Automatic extraction of object-oriented component interfaces , 2002, ISSTA '02.

[2]  Siau-Cheng Khoo,et al.  SMArTIC: towards building an accurate, robust and scalable specification miner , 2006, SIGSOFT '06/FSE-14.

[3]  Dawn Xiaodong Song,et al.  MACE: Model-inference-Assisted Concolic Exploration for Protocol and Vulnerability Discovery , 2011, USENIX Security Symposium.

[4]  Yuriy Brun,et al.  Automatic mining of specifications from invocation traces and method invariants , 2014, SIGSOFT FSE.

[5]  Yuriy Brun,et al.  Inferring models of concurrent systems from logs of their behavior with CSight , 2014, ICSE.

[6]  Neil Walkinshaw,et al.  Inferring Finite-State Models with Temporal Constraints , 2008, 2008 23rd IEEE/ACM International Conference on Automated Software Engineering.

[7]  Arnold Rosenbloom,et al.  AutoFuzz: Automated Network Protocol Fuzzing Framework , 2010 .

[8]  Andreas Zeller,et al.  Mining behavior models from enterprise web applications , 2013, ESEC/FSE 2013.

[9]  Alexander L. Wolf,et al.  Discovering models of software processes from event-based data , 1998, TSEM.

[10]  Cláudia Antunes,et al.  Temporal Data Mining: an overview , 2001 .

[11]  Leonardo Mariani,et al.  Automatic generation of software behavioral models , 2008, 2008 ACM/IEEE 30th International Conference on Software Engineering.

[12]  Zhenkai Liang,et al.  Polyglot: automatic extraction of protocol message format using dynamic binary analysis , 2007, CCS '07.

[13]  Yuriy Brun,et al.  Using Declarative Specification to Improve the Understanding, Extensibility, and Comparison of Model-Inference Algorithms , 2015, IEEE Transactions on Software Engineering.

[14]  Helen J. Wang,et al.  Discoverer: Automatic Protocol Reverse Engineering from Network Traces , 2007, USENIX Security Symposium.

[15]  Dennis J. Turner,et al.  Symantec Internet Security Threat Report Trends for July 04-December 04 , 2005 .

[16]  Stefan Schmid,et al.  Free Riding in BitTorrent is Cheap , 2006, HotNets.

[17]  Steven P. Reiss,et al.  Encoding program executions , 2001, Proceedings of the 23rd International Conference on Software Engineering. ICSE 2001.

[18]  José Oncina,et al.  Learning Stochastic Regular Grammars by Means of a State Merging Method , 1994, ICGI.

[19]  Yuriy Brun,et al.  Unifying FSM-inference algorithms through declarative specification , 2013, 2013 35th International Conference on Software Engineering (ICSE).

[20]  Yuriy Brun,et al.  Leveraging existing instrumentation to automatically infer invariant-constrained models , 2011, ESEC/FSE '11.

[21]  David Lo,et al.  Automatic steering of behavioral model inference , 2009, ESEC/FSE '09.

[22]  Andreas Zeller,et al.  Mining object behavior with ADABU , 2006, WODA '06.

[23]  Siau-Cheng Khoo,et al.  QUARK: Empirical Assessment of Automaton-based Specification Miners , 2006, 2006 13th Working Conference on Reverse Engineering.

[24]  David Notkin,et al.  Mutually Enhancing Test Generation and Specification Inference , 2003, FATES.

[25]  Edmund M. Clarke,et al.  Counterexample-Guided Abstraction Refinement , 2000, CAV.

[26]  Carlo Ghezzi,et al.  Mining behavior models from user-intensive web applications , 2014, ICSE.

[27]  David Lee,et al.  Authentication and authorization protocol security property analysis with trace inclusion transformation and online minimization , 2010, The 18th IEEE International Conference on Network Protocols.

[28]  Dana Angluin,et al.  Finding Patterns Common to a Set of Strings , 1980, J. Comput. Syst. Sci..

[29]  Ivan Beschastnikh,et al.  Synergizing Specification Miners through Model Fissions and Fusions (T) , 2015, 2015 30th IEEE/ACM International Conference on Automated Software Engineering (ASE).

[30]  Yuriy Brun,et al.  Mining precise performance-aware behavioral models from existing instrumentation , 2014, ICSE Companion.

[31]  Zhendong Su,et al.  Javert: fully automatic mining of general temporal properties from dynamic traces , 2008, SIGSOFT '08/FSE-16.

[32]  Christopher Krügel,et al.  Prospex: Protocol Specification Extraction , 2009, 2009 30th IEEE Symposium on Security and Privacy.

[33]  Andreas Zeller,et al.  Automatically Generating Test Cases for Specification Mining , 2012, IEEE Transactions on Software Engineering.

[34]  Dawn Xiaodong Song,et al.  Dispatcher: enabling active botnet infiltration using automatic protocol reverse-engineering , 2009, CCS.

[35]  Kevin C. Almeroth,et al.  SNOOZE: Toward a Stateful NetwOrk prOtocol fuzZEr , 2006, ISC.

[36]  Cristina Nita-Rotaru,et al.  Gatling: Automatic Attack Discovery in Large-Scale Distributed Systems , 2012, NDSS.

[37]  Sanjit A. Seshia,et al.  Mining assumptions for synthesis , 2011, Ninth ACM/IEEE International Conference on Formal Methods and Models for Codesign (MEMPCODE2011).

[38]  Gail C. Murphy,et al.  Who should fix this bug? , 2006, ICSE.

[39]  Jerome A. Feldman,et al.  On the Synthesis of Finite-State Machines from Samples of Their Behavior , 1972, IEEE Transactions on Computers.

[40]  E. Mark Gold,et al.  Language Identification in the Limit , 1967, Inf. Control..

[41]  David Lee,et al.  A model-based approach to security flaw detection of network protocol implementations , 2008, 2008 IEEE International Conference on Network Protocols.

[42]  Yuriy Brun,et al.  Behavioral resource-aware model inference , 2014, ASE.