Formal Verification of Decision-Tree Ensemble Model and Detection of its Violating-input-value Ranges

As one type of machine-learning model, a "decision-tree ensemble model" (DTEM) is represented by a set of decision trees. A DTEM is mainly known to be valid for structured data; however, like other machine-learning models, it is difficult to train so that it returns the correct output value for any input value. Accordingly, when a DTEM is used in regard to a system that requires reliability, it is important to comprehensively detect input values that lead to malfunctions of a system (failures) during development and take appropriate measures. One conceivable solution is to install an input filter that controls the input to the DTEM, and to use separate software to process input values that may lead to failures. To develop the input filter, it is necessary to specify the filtering condition of the input value that leads to the malfunction of the system. Given that necessity, in this paper, we propose a method for formally verifying a DTEM and, according to the result of the verification, if an input value leading to a failure is found, extracting the range in which such an input value exists. The proposed method can comprehensively extract the range in which the input value leading to the failure exists; therefore, by creating an input filter based on that range, it is possible to prevent the failure occurring in the system. In this paper, the algorithm of the proposed method is described, and the results of a case study using a dataset of house prices are presented. On the basis of those results, the feasibility of the proposed method is demonstrated, and its scalability is evaluated.

[1]  A Classification Method using Decision Tree for Uncertain Data , 2012 .

[2]  Patrick Albert,et al.  Extracting business rules from COBOL: A model-based framework , 2013, 2013 20th Working Conference on Reverse Engineering (WCRE).

[3]  Tianqi Chen,et al.  XGBoost: A Scalable Tree Boosting System , 2016, KDD.

[4]  Mykel J. Kochenderfer,et al.  Reluplex: An Efficient SMT Solver for Verifying Deep Neural Networks , 2017, CAV.

[5]  Josef Pichler,et al.  Specification extraction by symbolic execution , 2013, 2013 20th Working Conference on Reverse Engineering (WCRE).

[6]  J. Friedman Greedy function approximation: A gradient boosting machine. , 2001 .

[7]  Wei-Tek Tsai,et al.  Business rule extraction from legacy code , 1996, Proceedings of 20th International Computer Software and Applications Conference: COMPSAC '96.

[8]  Paul E. Utgoff,et al.  Incremental Induction of Decision Trees , 1989, Machine Learning.

[9]  James Bailey,et al.  Discovery of Minimal Unsatisfiable Subsets of Constraints Using Hitting Set Dualization , 2005, PADL.

[10]  A. Prasad,et al.  Newer Classification and Regression Tree Techniques: Bagging and Random Forests for Ecological Prediction , 2006, Ecosystems.

[11]  Min Wu,et al.  Safety Verification of Deep Neural Networks , 2016, CAV.

[12]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[13]  Patrick Albert,et al.  A Model Driven Reverse Engineering Framework for Extracting Business Rules Out of a Java Application , 2012, RuleML.

[14]  Michel Leconte,et al.  Using Constraints to Verify Properties of Rule Programs , 2010, 2010 Third International Conference on Software Testing, Verification, and Validation Workshops.

[15]  Christopher J. C. Burges,et al.  From RankNet to LambdaRank to LambdaMART: An Overview , 2010 .

[16]  Thai Son Hoang,et al.  Formal Development of Policing Functions for Intelligent Systems , 2017, 2017 IEEE 28th International Symposium on Software Reliability Engineering (ISSRE).

[17]  Michelle C. Tappert,et al.  Monitoring organic carbon, total nitrogen, and pH for reclaimed soils using field reflectance spectroscopy , 2017, Canadian Journal of Soil Science.

[18]  Rüdiger Ehlers,et al.  Formal Verification of Piece-Wise Linear Feed-Forward Neural Networks , 2017, ATVA.

[19]  Katsuro Inoue,et al.  Extraction of Conditional Statements for Understanding Business Rules , 2014, 2014 6th International Workshop on Empirical Software Engineering in Practice.

[20]  Roberto Baldoni,et al.  A Survey of Symbolic Execution Techniques , 2016, ACM Comput. Surv..

[21]  Matthew Richardson,et al.  Predicting clicks: estimating the click-through rate for new ads , 2007, WWW '07.

[22]  Adrian Giurca,et al.  Handbook of Research on Emerging Rule-based Languages and Technologies: Open Solutions and Approaches , 2009 .

[23]  Christel Baier,et al.  Tools and Algorithms for the Construction and Analysis of Systems , 2015, Lecture Notes in Computer Science.

[24]  Jianling Sun,et al.  Business rules extraction from large legacy systems , 2004, Eighth European Conference on Software Maintenance and Reengineering, 2004. CSMR 2004. Proceedings..

[25]  H. Ishwaran Variable importance in binary regression trees and forests , 2007, 0711.2434.

[26]  Ping Li,et al.  Robust LogitBoost and Adaptive Base Class (ABC) LogitBoost , 2010, UAI.

[27]  Harry M. Sneed Extracting business logic from existing COBOL programs as a basis for redevelopment , 2001, Proceedings 9th International Workshop on Program Comprehension. IWPC 2001.

[28]  Mark H. Liffiton,et al.  Enumerating Infeasibility: Finding Multiple MUSes Quickly , 2013, CPAIOR.

[29]  Tara N. Sainath,et al.  FUNDAMENTAL TECHNOLOGIES IN MODERN SPEECH RECOGNITION Digital Object Identifier 10.1109/MSP.2012.2205597 , 2012 .

[30]  Karem A. Sakallah,et al.  Algorithms for Computing Minimal Unsatisfiable Subsets of Constraints , 2007, Journal of Automated Reasoning.

[31]  R. V. Rossel,et al.  Using data mining to model and interpret soil diffuse reflectance spectra. , 2010 .