CLORIFI: software vulnerability discovery using code clone verification

Software vulnerability has long been considered an important threat to the system safety. A vulnerability is often reproduced because of the frequent code reuse by programmers. Security patches are usually not propagated to all code clones; however, they could be leveraged to discover unknown vulnerabilities. Static code auditing approaches are frequently proposed to scan source codes for security flaws; unfortunately, these approaches generate too many false positives. While dynamic execution analysis methods can precisely report vulnerabilities, they are ineffective in path exploration, which limits them to scale to large programs. With the purpose of detecting vulnerability in a scalable way with more preciseness, in this paper, we propose a novel mechanism, called software vulnerability discovery using Code Clone Verification (CLORIFI), that scalably discovers vulnerabilities in real world programs using code clone verification. In the beginning, we use a fast and scalable syntax‐based way to find code clones in program source codes based on released security patches. Subsequently, code clones are being verified using concolic testing to dramatically decrease the false positives. In addition, we mitigate the path explosion problem by backward sensitive data tracing in concolic execution. Experiments have been conducted with real‐world open‐source projects (recent Linux OS distributions and program packages). As a result, we found 7 real vulnerabilities out of 63 code clones from Ubuntu 14.04 LTS (Canonical, London, UK) and 10 vulnerabilities out of 40 code clones from CentOS 7.0 (The CentOS Project(community contributed)). Furthermore, we confirmed more code clone vulnerabilities in various versions of programs including Rsyslog (Open Source(Original author: Rainer Gerhards)), Apache (Apache Software Foundation, Forest Hill, Maryland, USA) and Firefox (Mozilla Corporation, Mountain View, California, USA). In order to evaluate the effectiveness of vulnerability verification in a systematic way, we also utilized Juliet Test Suite as measurement objects. The results show that CLORIFI achieves 98% accuracy with 0 false positives. Copyright © 2015 John Wiley & Sons, Ltd.

[1]  Andrei Broder,et al.  Network Applications of Bloom Filters: A Survey , 2004, Internet Math..

[2]  Chanchal Kumar Roy,et al.  On the Effectiveness of Simhash for Detecting Near-Miss Clones in Large Scale Software Systems , 2011, 2011 18th Working Conference on Reverse Engineering.

[3]  David Brumley,et al.  ReDeBug: Finding Unpatched Code Clones in Entire OS Distributions , 2012, 2012 IEEE Symposium on Security and Privacy.

[4]  Peter Oehlert,et al.  Violating Assumptions with Fuzzing , 2005, IEEE Secur. Priv..

[5]  Zhendong Su,et al.  Steering symbolic execution to less traveled paths , 2013, OOPSLA.

[6]  Koushik Sen,et al.  CUTE: a concolic unit testing engine for C , 2005, ESEC/FSE-13.

[7]  Michael Hicks,et al.  Directed Symbolic Execution , 2011, SAS.

[8]  Konrad Rieck,et al.  Chucky: exposing missing checks in source code for vulnerability discovery , 2013, CCS.

[9]  David Evans,et al.  Improving Security Using Extensible Lightweight Static Analysis , 2002, IEEE Softw..

[10]  Nikolaj Bjørner,et al.  Z3: An Efficient SMT Solver , 2008, TACAS.

[11]  Sanjay Ghemawat,et al.  MapReduce: simplified data processing on large clusters , 2008, CACM.

[12]  Junfeng Yang,et al.  Scalable and systematic detection of buggy inconsistencies in source code , 2010, OOPSLA.

[13]  Paul E. Black,et al.  Juliet 1.1 C/C++ and Java Test Suite , 2012, Computer.

[14]  Moonzoo Kim,et al.  Industrial Application of Concolic Testing on Embedded Software: Case Studies , 2012, 2012 IEEE Fifth International Conference on Software Testing, Verification and Validation.

[15]  Gary McGraw,et al.  ITS4: a static vulnerability scanner for C and C++ code , 2000, Proceedings 16th Annual Computer Security Applications Conference (ACSAC'00).

[16]  Sunghun Kim,et al.  How we get there: a context-guided search strategy in concolic testing , 2014, SIGSOFT FSE.

[17]  Koushik Sen,et al.  Heuristics for Scalable Dynamic Test Generation , 2008, 2008 23rd IEEE/ACM International Conference on Automated Software Engineering.

[18]  Wenhua Wang,et al.  Detecting vulnerabilities in C programs using trace-based testing , 2010, 2010 IEEE/IFIP International Conference on Dependable Systems & Networks (DSN).

[19]  Heejo Lee,et al.  A scalable approach for vulnerability discovery based on security patches , 2014 .

[20]  Heejo Lee,et al.  Software Vulnerability Detection Using Backward Trace Analysis and Symbolic Execution , 2013, 2013 International Conference on Availability, Reliability and Security.

[21]  Gogul Balakrishnan,et al.  Feedback-directed unit test generation for C/C++ using concolic execution , 2013, 2013 35th International Conference on Software Engineering (ICSE).

[22]  Moonzoo Kim,et al.  Concolic testing of the multi-sector read operation for flash storage platform software , 2012, Formal Aspects of Computing.

[23]  Rupak Majumdar,et al.  Hybrid Concolic Testing , 2007, 29th International Conference on Software Engineering (ICSE'07).

[24]  Gary McGraw,et al.  An automated approach for identifying potential vulnerabilities in software , 1998, Proceedings. 1998 IEEE Symposium on Security and Privacy (Cat. No.98CB36186).

[25]  Dawson R. Engler,et al.  KLEE: Unassisted and Automatic Generation of High-Coverage Tests for Complex Systems Programs , 2008, OSDI.

[26]  Robert O. Hastings,et al.  Fast detection of memory leaks and access errors , 1991 .

[27]  Zhendong Su,et al.  DECKARD: Scalable and Accurate Tree-Based Detection of Code Clones , 2007, 29th International Conference on Software Engineering (ICSE'07).

[28]  Matt Bishop,et al.  Testing C Programs for Buffer Overflow Vulnerabilities , 2003, NDSS.

[29]  John Langford,et al.  Hash Kernels for Structured Data , 2009, J. Mach. Learn. Res..

[30]  Muthu Dayalan,et al.  MapReduce : Simplified Data Processing on Large Cluster , 2018 .

[31]  Koushik Sen DART: Directed Automated Random Testing , 2009, Haifa Verification Conference.

[32]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.