Silas: A high-performance machine learning foundation for logical reasoning and verification

Abstract This paper introduces a new high-performance machine learning tool named Silas, which is built to provide a more transparent, dependable and efficient data analytics service. We discuss the machine learning aspects of Silas and demonstrate the advantage of Silas in its predictive and computational performance. We show that several customised algorithms in Silas yield better predictions in a significantly shorter time compared to the state-of-the-art. Another focus of Silas is on providing a formal foundation of decision trees to support logical analysis and verification of learned prediction models. We illustrate the potential capabilities of the fusion of machine learning and logical reasoning by showcasing applications in three directions: formal verification of the prediction model against user specifications, training correct-by-construction models, and explaining the decision-making of predictions.

[1]  Salvatore Ruggieri,et al.  Enumerating Distinct Decision Trees , 2017, ICML.

[2]  Wei Gao,et al.  Weighted Oblique Decision Trees , 2019, AAAI.

[3]  Tara N. Sainath,et al.  Deep Neural Networks for Acoustic Modeling in Speech Recognition: The Shared Views of Four Research Groups , 2012, IEEE Signal Processing Magazine.

[4]  Johan A. K. Suykens,et al.  Learning with continuous piecewise linear decision trees , 2020, Expert Syst. Appl..

[5]  Sanjeev Khudanpur,et al.  Librispeech: An ASR corpus based on public domain audio books , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[6]  Ajinkya More,et al.  Survey of resampling techniques for improving classification performance in unbalanced datasets , 2016, ArXiv.

[7]  Karem A. Sakallah,et al.  Generalizing Core-Guided Max-SAT , 2009, SAT.

[8]  Heiko Wersing,et al.  Enhancing Very Fast Decision Trees with Local Split-Time Predictions , 2018, 2018 IEEE International Conference on Data Mining (ICDM).

[9]  C. Gomez-Uribe,et al.  The Netflix Recommender System: Algorithms, Business Value, and Innovation , 2016, ACM Trans. Manag. Inf. Syst..

[10]  Francesco Marcelloni,et al.  An analysis of boosted ensembles of binary fuzzy decision trees , 2020, Expert Syst. Appl..

[11]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[12]  Eirini Ntoutsi,et al.  FAHT: An Adaptive Fairness-aware Decision Tree Classifier , 2019, IJCAI.

[13]  Thomas G. Dietterich What is machine learning? , 2020, Archives of Disease in Childhood.

[14]  Ananda S. Chowdhury,et al.  DTI based Alzheimer's disease classification with rank modulated fusion of CNNs and random forest , 2020, Expert Syst. Appl..

[15]  Regina Esi Turkson,et al.  A machine learning approach for predicting bank credit worthiness , 2016, 2016 Third International Conference on Artificial Intelligence and Pattern Recognition (AIPR).

[16]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[17]  A. Winsor Sampling techniques. , 2000, Nursing times.

[18]  Carmela Iorio,et al.  Informative trees by visual pruning , 2019, Expert Syst. Appl..

[19]  Jin Song Dong,et al.  Towards Dependable and Explainable Machine Learning Using Automated Reasoning , 2018, ICFEM.

[20]  Pierre Geurts,et al.  Extremely randomized trees , 2006, Machine Learning.

[21]  Yixin Chen,et al.  Optimal Action Extraction for Random Forests and Boosted Trees , 2015, KDD.

[22]  Martin Gjoreski,et al.  Learning comprehensible and accurate hybrid trees , 2021, Expert Syst. Appl..

[23]  Juan José Rodríguez Diez,et al.  An experimental evaluation of mixup regression forests , 2020, Expert Syst. Appl..

[24]  Lance Eliot,et al.  Autonomous Vehicle Driverless Self-Driving Cars and Artificial Intelligence: Practical Advances in AI and Machine Learning , 2017 .

[25]  Juan Aparicio,et al.  Efficiency analysis trees: A new methodology for estimating production frontiers through decision trees , 2020, Expert Syst. Appl..

[26]  Antonio Criminisi,et al.  Adaptive Neural Trees , 2018, ICML.

[27]  Processing Systems , 2022, Essentials of Thermal Processing.

[28]  Jingyu He,et al.  Accelerated Bayesian Additive Regression Trees , 2018, AISTATS.

[29]  David Abrahams,et al.  C++ Template Metaprogramming: Concepts, Tools, and Techniques from Boost and Beyond (C++ In-Depth Series) , 2004 .

[30]  C. E. SHANNON,et al.  A mathematical theory of communication , 1948, MOCO.

[31]  Peter E. Hart,et al.  The condensed nearest neighbor rule (Corresp.) , 1968, IEEE Trans. Inf. Theory.

[32]  Nikolaj Bjørner,et al.  Z3: An Efficient SMT Solver , 2008, TACAS.

[33]  Zhi-Hua Zhou,et al.  Exploratory Undersampling for Class-Imbalance Learning , 2009, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[34]  Trevor Hastie,et al.  Multi-class AdaBoost ∗ , 2009 .

[35]  D. Kibler,et al.  Instance-based learning algorithms , 2004, Machine Learning.

[36]  Scott Lundberg,et al.  A Unified Approach to Interpreting Model Predictions , 2017, NIPS.

[37]  Leo Breiman,et al.  Classification and Regression Trees , 1984 .

[38]  Nasser M. Nasrabadi,et al.  Pattern Recognition and Machine Learning , 2006, Technometrics.

[39]  Carlos Guestrin,et al.  "Why Should I Trust You?": Explaining the Predictions of Any Classifier , 2016, ArXiv.

[40]  Luís Torgo,et al.  OpenML: networked science in machine learning , 2014, SKDD.

[41]  Johannes Gehrke,et al.  Intelligible Models for HealthCare: Predicting Pneumonia Risk and Hospital 30-day Readmission , 2015, KDD.

[42]  Yoshua Bengio,et al.  Deep Sparse Rectifier Neural Networks , 2011, AISTATS.

[43]  William G. Cochran,et al.  Sampling Techniques, 3rd Edition , 1963 .

[44]  Tie-Yan Liu,et al.  LightGBM: A Highly Efficient Gradient Boosting Decision Tree , 2017, NIPS.

[45]  Simin Nadjm-Tehrani,et al.  Formal Verification of Random Forests in Safety-Critical Applications , 2018, FTSCS.

[46]  Robert E. Schapire,et al.  Explaining AdaBoost , 2013, Empirical Inference.

[47]  Fabrizio Angiulli,et al.  Fast Nearest Neighbor Condensation for Large Data Sets Classification , 2007, IEEE Transactions on Knowledge and Data Engineering.

[48]  Daniel Povey,et al.  The Kaldi Speech Recognition Toolkit , 2011 .

[49]  Thomas Lengauer,et al.  Permutation importance: a corrected feature importance measure , 2010, Bioinform..

[50]  Rüdiger Ehlers,et al.  Formal Verification of Piece-Wise Linear Feed-Forward Neural Networks , 2017, ATVA.

[51]  Rayid Ghani,et al.  Predicting customer shopping lists from point-of-sale purchase data , 2004, KDD.

[52]  Yoav Freund,et al.  A Short Introduction to Boosting , 1999 .

[53]  Tianqi Chen,et al.  XGBoost: A Scalable Tree Boosting System , 2016, KDD.

[54]  Marco Pistore,et al.  Nusmv version 2: an opensource tool for symbolic model checking , 2002, CAV 2002.

[55]  Satoshi Hara,et al.  Making Tree Ensembles Interpretable: A Bayesian Model Selection Approach , 2016, AISTATS.

[56]  Igor Kononenko,et al.  Machine learning for medical diagnosis: history, state of the art and perspective , 2001, Artif. Intell. Medicine.

[57]  Rajeev Goré,et al.  Implementing Tableau Calculi Using BDDs: BDDTab System Description , 2014, IJCAR.

[58]  Lazaros G. Papageorgiou,et al.  A regression tree approach using mathematical programming , 2017, Expert Syst. Appl..

[59]  I. Tomek,et al.  Two Modifications of CNN , 1976 .

[60]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[61]  Andreas Ziegler,et al.  ranger: A Fast Implementation of Random Forests for High Dimensional Data in C++ and R , 2015, 1508.04409.

[62]  Maria Paola Bonacina Automated Reasoning for Explainable Artificial Intelligence , 2017, ARCADE@CADE.

[63]  Lior Rokach,et al.  AugBoost: Gradient Boosting Enhanced with Step-Wise Feature Augmentation , 2019, IJCAI.