NNSMT: Deep Neural Networks for SMT Solvers Fuzzing

SMT solvers are important tools in the field of software engineering, which is often used to determine the satisfiability of formulas in formal methods, such as software verification, program synthesis, program verification, etc. However, due to their complex implementations, solvers may contain critical bugs that lead to unsound results. Fuzz testing is an effective method to detect new errors or security vulnerabilities in SMT solvers, but the existing SMT solvers fuzzing are not scalable enough and too dependent on the complex grammar rules provided by human beings. To tackle this challenge, we propose a grammar-based fuzzing tool called NNSMT. NNSMT uses the attention mechanism to capture the long-distance dependencies, and adopts an encoder-decoder architecture with the embedding of these long-distance dependencies to build a language model of regular programs. Moreover, it uses the language model to automatically and continuously generates well-formed SMT test programs. We use this set of generated program to fuzz off-the-shelf SMT solvers(e.g., Z3, CVC4). We present a detailed case study to analyze the pass rate and coverage improvement of the generated SMT program. we analyze the performance of NNSMT under five new generation strategies and two sampling methods. Extensive experiments show that the average pass rate of SMT test programs generated by NNSMT is more than 85%, and the code coverage of SMT solvers has been significantly improved. In our preliminary study, we found 3 bugs of Z3 solver, all of which are actively being addressed by developers.

[1]  Yongjun Wang,et al.  DSmith: Compiler Fuzzing through Generative Deep Learning Model with Attention , 2020, 2020 International Joint Conference on Neural Networks (IJCNN).

[2]  Jianping Yin,et al.  Multi-View Spectral Clustering with Optimal Neighborhood Laplacian Matrix , 2020, AAAI.

[3]  Xinwang Liu,et al.  Robust Multi-View Clustering With a Unified Weight Learning Paradigm , 2019, IEEE Access.

[4]  Chris Cummins,et al.  Compiler fuzzing through deep learning , 2018, ISSTA.

[5]  Rishabh Singh,et al.  Learn&Fuzz: Machine learning for input fuzzing , 2017, 2017 32nd IEEE/ACM International Conference on Automated Software Engineering (ASE).

[6]  Premkumar T. Devanbu,et al.  On the naturalness of software , 2016, Perspectives on Data Science for Software Engineering.

[7]  Rishabh Singh,et al.  Automated Correction for Syntax Errors in Programming Assignments using Recurrent Neural Networks , 2016, ArXiv.

[8]  Meir Kalech,et al.  Data-Augmented Software Diagnosis , 2016, DX.

[9]  Pedro M. Domingos,et al.  Learning Tractable Probabilistic Models for Fault Localization , 2015, AAAI.

[10]  Patrick Schaumont,et al.  Quantitative Masking Strength: Quantifying the Power Side-Channel Resistance of Software Code , 2015, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[11]  Solomon Surya Tej Mano Sajjan,et al.  Comparision between LPSAT and SMT for RTL verification , 2015, 2015 International Conference on Circuits, Power and Computing Technologies [ICCPCT-2015].

[12]  Meir Kalech,et al.  Using Model-Based Diagnosis to Improve Software Testing , 2014, AAAI.

[13]  Gernot Heiser,et al.  Trickle: Automated infeasible path detection using all minimal unsatisfiable subsets , 2014, 2014 IEEE 19th Real-Time and Embedded Technology and Applications Symposium (RTAS).

[14]  Jiri Barnat,et al.  Model Checking Parallel Programs with Inputs , 2014, 2014 22nd Euromicro International Conference on Parallel, Distributed, and Network-Based Processing.

[15]  Anh Tuan Nguyen,et al.  A statistical semantic language model for source code , 2013, ESEC/FSE 2013.

[16]  Charles A. Sutton,et al.  Mining source code repositories at massive scale using language modeling , 2013, 2013 10th Working Conference on Mining Software Repositories (MSR).

[17]  Premkumar T. Devanbu,et al.  On the naturalness of software , 2016, Commun. ACM.

[18]  Gerwin Klein,et al.  Operating system verification—An overview , 2009 .

[19]  Bernd Fischer,et al.  SMT-Based Bounded Model Checking for Embedded ANSI-C Software , 2012, IEEE Transactions on Software Engineering.

[20]  Xavier Leroy,et al.  Formal verification of a realistic compiler , 2009, CACM.