Automatically finding the control variables for complex system behavior

Testing large-scale systems is expensive in terms of both time and money. Running simulations early in the process is a proven method of finding the design faults likely to lead to critical system failures, but determining the exact cause of those errors is still time-consuming and requires access to a limited number of domain experts. It is desirable to find an automated method that explores the large number of combinations and is able to isolate likely fault points.Treatment learning is a subset of minimal contrast-set learning that, rather than classifying data into distinct categories, focuses on finding the unique factors that lead to a particular classification. That is, they find the smallest change to the data that causes the largest change in the class distribution. These treatments, when imposed, are able to identify the factors most likely to cause a mission-critical failure. The goal of this research is to comparatively assess treatment learning against state-of-the-art numerical optimization techniques. To achieve this, this paper benchmarks the TAR3 and TAR4.1 treatment learners against optimization techniques across three complex systems, including two projects from the Robust Software Engineering (RSE) group within the National Aeronautics and Space Administration (NASA) Ames Research Center. The results clearly show that treatment learning is both faster and more accurate than traditional optimization methods.

[1]  H. B. Mann,et al.  On a Test of Whether one of Two Random Variables is Stochastically Larger than the Other , 1947 .

[2]  N. Metropolis,et al.  Equation of State Calculations by Fast Computing Machines , 1953, Resonance.

[3]  Giuliano Antoniol,et al.  Recovering Traceability Links between Code and Documentation , 2002, IEEE Trans. Software Eng..

[4]  A. O'Hagan,et al.  Probabilistic sensitivity analysis of complex models: a Bayesian approach , 2004 .

[5]  Philip E. Gill,et al.  Practical optimization , 1981 .

[6]  Jude W. Shavlik,et al.  Extracting refined rules from knowledge-based neural networks , 2004, Machine Learning.

[7]  Nong Shang,et al.  Parameter uncertainty and interaction in complex environmental models , 1994 .

[8]  Eugene Tuv,et al.  Best Subset Feature Selection for Massive Mixed-Type Problems , 2006, IDEAL.

[9]  Tomás E. Uribe,et al.  Ordered Binary Decision Diagrams and the Davis-Putnam Procedure , 1994, CCL.

[10]  Tim Menzies,et al.  Practical large scale what-if queries: case studies with software risk assessment , 2000, Proceedings ASE 2000. Fifteenth IEEE International Conference on Automated Software Engineering.

[11]  Eugene Tuv,et al.  Ensembles of Regularized Least Squares Classifiers for High-Dimensional Problems , 2006, Feature Extraction.

[12]  Usama M. Fayyad,et al.  Multi-Interval Discretization of Continuous-Valued Attributes for Classification Learning , 1993, IJCAI.

[13]  Emilio Corchado,et al.  Intelligent Data Engineering and Automated Learning - IDEAL 2006, 7th International Conference, Burgos, Spain, September 20-23, 2006, Proceedings , 2006, IDEAL.

[14]  Andrian Marcus,et al.  Recovering documentation-to-source-code traceability links using latent semantic indexing , 2003, 25th International Conference on Software Engineering, 2003. Proceedings..

[15]  Tim Menzies,et al.  Problems with Precision , 2007 .

[16]  Barry W. Boehm,et al.  Understanding and Controlling Software Costs , 1988, IEEE Trans. Software Eng..

[17]  Peter C Austin,et al.  A comparison of the ability of different propensity score models to balance measured variables between treated and untreated subjects: a Monte Carlo study , 2007, Statistics in medicine.

[18]  Shane Sendall,et al.  Model Transformation: The Heart and Soul of Model-Driven Software Development , 2003, IEEE Softw..

[19]  C. D. Gelatt,et al.  Optimization by Simulated Annealing , 1983, Science.

[20]  Joost N. Kok,et al.  Knowledge Discovery in Databases: PKDD 2007, 11th European Conference on Principles and Practice of Knowledge Discovery in Databases, Warsaw, Poland, September 17-21, 2007, Proceedings , 2007, PKDD.

[21]  Stephen D. Bay,et al.  Detecting change in categorical data: mining contrast sets , 1999, KDD '99.

[22]  Tomasz Imielinski,et al.  Mining association rules between sets of items in large databases , 1993, SIGMOD Conference.

[23]  J. van Leeuwen,et al.  Intelligent Data Engineering and Automated Learning , 2003, Lecture Notes in Computer Science.

[24]  Jason Arnold,et al.  ANTARES: Spacecraft Simulation for Multiple User Communities and Facilities , 2007 .

[25]  Tim Menzies,et al.  Data Mining for Very Busy People , 2003, Computer.

[26]  Robert C. Holte,et al.  Very Simple Classification Rules Perform Well on Most Commonly Used Datasets , 1993, Machine Learning.

[27]  Kenneth A. Rose,et al.  Parameter sensitivities, monte carlo filtering, and model forecasting under uncertainty , 1991 .

[28]  Hu Jing,et al.  Contributors to a Signal from an Artificial Contrast , 2007 .

[29]  Joaquim Filipe,et al.  Informatics in Control, Automation and Robotics II , 2007 .

[30]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques, 3rd Edition , 1999 .

[31]  Roman Barták,et al.  Constraint Processing , 2009, Encyclopedia of Artificial Intelligence.

[32]  B. J. Taylor,et al.  Rule extraction as a formal method for the verification and validation of neural networks , 2005, Proceedings. 2005 IEEE International Joint Conference on Neural Networks, 2005..

[33]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[34]  Ying Hu,et al.  TREATMENT LEARNING: IMPLEMENTATION AND APPLICATION , 2003 .

[35]  Johann Schumann,et al.  Under Consideration for Publication in J. Functional Programming Autobayes: a System for Generating Data Analysis Programs from Statistical Models , 2022 .

[36]  J. Schumann,et al.  Software V&V support by parametric analysis of large software simulation systems , 2009, 2009 IEEE Aerospace conference.

[37]  S.L. Cornford,et al.  DDP: a tool for life-cycle risk management , 2006, IEEE Aerospace and Electronic Systems Magazine.

[38]  Andres S. Orrego,et al.  SAWTOOTH: Learning from Huge Amounts of Data , 2004 .

[39]  Gerard J. Holzmann,et al.  The Model Checker SPIN , 1997, IEEE Trans. Software Eng..

[40]  Ian Witten,et al.  Data Mining , 2000 .

[41]  Wei-Yin Loh,et al.  Classification and regression trees , 2011, WIREs Data Mining Knowl. Discov..

[42]  Tim Menzies,et al.  Tool Support for Parametric Analysis of Large Software Simulation Systems , 2008, 2008 23rd IEEE/ACM International Conference on Automated Software Engineering.

[43]  G Gigerenzer,et al.  Reasoning the fast and frugal way: models of bounded rationality. , 1996, Psychological review.

[44]  Marvin V. Zelkowitz,et al.  Lessons learned from 25 years of process improvement: the rise and fall of the NASA software engineering laboratory , 2002, Proceedings of the 24th International Conference on Software Engineering. ICSE 2002.

[45]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[46]  Tim Menzies,et al.  Data Mining Static Code Attributes to Learn Defect Predictors , 2007, IEEE Transactions on Software Engineering.

[47]  Saltelli Andrea,et al.  Global Sensitivity Analysis: The Primer , 2008 .

[48]  Martin S. Feather,et al.  Application of a broad-spectrum quantitative requirements model to early-lifecycle decision making , 2007 .

[49]  Gary D. Boetticher,et al.  An Assessment of Metric Contribution in the Construction of a Neural Network-Based Effort Estimator , 2022 .

[50]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques with Java implementations , 2002, SGMD.

[51]  Tim Menzies,et al.  PARAMETRIC ANALYSIS OF ANTARES RE-ENTRY GUIDANCE ALGORITHMS USING ADVANCED TEST GENERATION AND DATA ANALYSIS , 2008 .

[52]  Eugene Tuv,et al.  Constructing High Dimensional Feature Space for Time Series Classification , 2007, PKDD.

[53]  Jun Gu,et al.  Algorithms for the satisfiability (SAT) problem: A survey , 1996, Satisfiability Problem: Theory and Applications.

[54]  David E. Goldberg,et al.  Genetic Algorithms in Search Optimization and Machine Learning , 1988 .

[55]  Joseph A. C. Delaney Sensitivity analysis , 2018, The African Continental Free Trade Area: Economic and Distributional Effects.

[56]  Tim Menzies,et al.  Finding robust solutions in requirements models , 2010, Automated Software Engineering.

[57]  Harvey M. Wagner,et al.  Global Sensitivity Analysis , 1995, Oper. Res..

[58]  Jing Hu,et al.  Contributors to a signal from an artificial contrast , 2005, ICINCO.

[59]  Christopher A. Sims,et al.  Matlab Optimization Software , 1999 .

[60]  Ada Wai-Chee Fu,et al.  Mining association rules with weighted items , 1998, Proceedings. IDEAS'98. International Database Engineering and Applications Symposium (Cat. No.98EX156).

[61]  Jane Cleland-Huang,et al.  The Detection and Classification of Non-Functional Requirements with Application to Early Aspects , 2006, 14th IEEE International Requirements Engineering Conference (RE'06).

[62]  Goldberg,et al.  Genetic algorithms , 1993, Robust Control Systems with Genetic Algorithms.

[63]  Stan Matwin,et al.  11th Knowledge Discovery in Databases: PKDD 2007 , 2007 .

[64]  Tim Menzies,et al.  Parametric Analysis of a Hover Test Vehicle using Advanced Test Generation and Data Analysis , 2009 .

[65]  Ayse Basar Bener,et al.  On the relative value of cross-company and within-company data for defect prediction , 2009, Empirical Software Engineering.

[66]  Alberto Maria Segre,et al.  Programs for Machine Learning , 1994 .

[67]  Ron Kohavi,et al.  Wrappers for Feature Subset Selection , 1997, Artif. Intell..

[68]  Yann-Gaël Guéhéneuc,et al.  Feature identification: a novel approach and a case study , 2005, 21st IEEE International Conference on Software Maintenance (ICSM'05).