Silas: High Performance, Explainable and Verifiable Machine Learning

This paper introduces a new classification tool named Silas, which is built to provide a more transparent and dependable data analytics service. A focus of Silas is on providing a formal foundation of decision trees in order to support logical analysis and verification of learned prediction models. This paper describes the distinct features of Silas: The Model Audit module formally verifies the prediction model against user specifications, the Enforcement Learning module trains prediction models that are guaranteed correct, the Model Insight and Prediction Insight modules reason about the prediction model and explain the decision-making of predictions. We also discuss implementation details ranging from programming paradigm to memory management that help achieve high-performance computation.

[1]  Simin Nadjm-Tehrani,et al.  Formal Verification of Random Forests in Safety-Critical Applications , 2018, FTSCS.

[2]  Igor Kononenko,et al.  Machine learning for medical diagnosis: history, state of the art and perspective , 2001, Artif. Intell. Medicine.

[3]  Johannes Gehrke,et al.  Intelligible Models for HealthCare: Predicting Pneumonia Risk and Hospital 30-day Readmission , 2015, KDD.

[4]  Yixin Chen,et al.  Optimal Action Extraction for Random Forests and Boosted Trees , 2015, KDD.

[5]  Nikolaj Bjørner,et al.  Satisfiability modulo theories , 2011, Commun. ACM.

[6]  Regina Esi Turkson,et al.  A machine learning approach for predicting bank credit worthiness , 2016, 2016 Third International Conference on Artificial Intelligence and Pattern Recognition (AIPR).

[7]  Marco Pistore,et al.  Nusmv version 2: an opensource tool for symbolic model checking , 2002, CAV 2002.

[8]  Jin Song Dong,et al.  Towards Dependable and Explainable Machine Learning Using Automated Reasoning , 2018, ICFEM.

[9]  Andreas Ziegler,et al.  ranger: A Fast Implementation of Random Forests for High Dimensional Data in C++ and R , 2015, 1508.04409.

[10]  Yoav Freund,et al.  A Short Introduction to Boosting , 1999 .

[11]  Nikolaj Bjørner,et al.  Z3: An Efficient SMT Solver , 2008, TACAS.

[12]  Lance Eliot,et al.  Autonomous Vehicle Driverless Self-Driving Cars and Artificial Intelligence: Practical Advances in AI and Machine Learning , 2017 .

[13]  Tianqi Chen,et al.  XGBoost: A Scalable Tree Boosting System , 2016, KDD.

[14]  P. Baldi,et al.  Searching for exotic particles in high-energy physics with deep learning , 2014, Nature Communications.

[15]  Rajeev Goré,et al.  Implementing Tableau Calculi Using BDDs: BDDTab System Description , 2014, IJCAR.

[16]  Satoshi Hara,et al.  Making Tree Ensembles Interpretable , 2016, 1606.05390.

[17]  Carlos Guestrin,et al.  "Why Should I Trust You?": Explaining the Predictions of Any Classifier , 2016, ArXiv.

[18]  Thomas Lengauer,et al.  Permutation importance: a corrected feature importance measure , 2010, Bioinform..

[19]  Rayid Ghani,et al.  Predicting customer shopping lists from point-of-sale purchase data , 2004, KDD.

[20]  Scott Lundberg,et al.  A Unified Approach to Interpreting Model Predictions , 2017, NIPS.

[21]  Aiko M. Hormann,et al.  Programs for Machine Learning. Part I , 1962, Inf. Control..

[22]  Gilles Louppe,et al.  Understanding Random Forests: From Theory to Practice , 2014, 1407.7502.

[23]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[24]  Maria Paola Bonacina Automated Reasoning for Explainable Artificial Intelligence , 2017, ARCADE@CADE.

[25]  Rüdiger Ehlers,et al.  Formal Verification of Piece-Wise Linear Feed-Forward Neural Networks , 2017, ATVA.

[26]  Sang Joon Kim,et al.  A Mathematical Theory of Communication , 2006 .

[27]  David Abrahams,et al.  C++ Template Metaprogramming: Concepts, Tools, and Techniques from Boost and Beyond (C++ In-Depth Series) , 2004 .

[28]  CARLOS A. GOMEZ-URIBE,et al.  The Netflix Recommender System , 2015, ACM Trans. Manag. Inf. Syst..