Propheticus: Machine Learning Framework for the Development of Predictive Models for Reliable and Secure Software

The growing complexity of software calls for innovative solutions that support the deployment of reliable and secure software. Machine Learning (ML) has shown its applicability to various complex problems and is frequently used in the dependability domain, both for supporting systems design and verification activities. However, using ML is complex and highly dependent on the problem in hand, increasing the probability of mistakes that compromise the results. In this paper, we introduce Propheticus, a ML framework that can be used to create predictive models for reliable and secure software systems. Propheticus attempts to abstract the complexity of ML whilst being easy to use and accommodating the needs of the users. To demonstrate its use, we present two case studies (vulnerability prediction and online failure prediction) that show how it can considerably ease and expedite a thorough ML workflow.

[1]  Gavin C. Cawley,et al.  On Over-fitting in Model Selection and Subsequent Selection Bias in Performance Evaluation , 2010, J. Mach. Learn. Res..

[2]  Baldoino Fonseca dos Santos Neto,et al.  Experimenting Machine Learning Techniques to Predict Vulnerabilities , 2016, 2016 Seventh Latin-American Symposium on Dependable Computing (LADC).

[3]  Michael E. Fagan Design and Code Inspections to Reduce Errors in Program Development , 1976, IBM Syst. J..

[4]  V. N. Venkatakrishnan,et al.  DynaMiner: Leveraging Offline Infection Analytics for On-the-Wire Malware Detection , 2017, 2017 47th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN).

[5]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[6]  Miroslaw Malek,et al.  A survey of online failure prediction methods , 2010, CSUR.

[7]  Dimitris Kanellopoulos,et al.  Data Preprocessing for Supervised Leaning , 2007 .

[8]  Seetha Hari,et al.  Learning From Imbalanced Data , 2019, Advances in Computer and Electrical Engineering.

[9]  Guy Lapalme,et al.  A systematic analysis of performance measures for classification tasks , 2009, Inf. Process. Manag..

[10]  Alejandro Baldominos Gómez,et al.  A Comparison Study of Classifier Algorithms for Cross-Person Physical Activity Recognition , 2016, Sensors.

[11]  Ernesto Costa,et al.  Exploratory Study of Machine Learning Techniques for Supporting Failure Prediction , 2018, 2018 14th European Dependable Computing Conference (EDCC).

[12]  Laurie A. Williams,et al.  Evaluating Complexity, Code Churn, and Developer Activity Metrics as Indicators of Software Vulnerabilities , 2011, IEEE Transactions on Software Engineering.

[13]  Bin Nie,et al.  Machine Learning Models for GPU Error Prediction in a Large Scale HPC System , 2018, 2018 48th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN).

[14]  Marco Vieira,et al.  A Practical Approach for Generating Failure Data for Assessing and Comparing Failure Prediction Algorithms , 2014, 2014 IEEE 20th Pacific Rim International Symposium on Dependable Computing.

[15]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[16]  Andy P. Field,et al.  Discovering Statistics Using Ibm Spss Statistics , 2017 .

[17]  Kahina Lazri,et al.  Anomaly Detection and Root Cause Localization in Virtual Network Functions , 2016, 2016 IEEE 27th International Symposium on Software Reliability Engineering (ISSRE).

[18]  Eric Jones,et al.  SciPy: Open Source Scientific Tools for Python , 2001 .

[19]  Marco Vieira,et al.  On the Metrics for Benchmarking Vulnerability Detection Tools , 2015, 2015 45th Annual IEEE/IFIP International Conference on Dependable Systems and Networks.

[20]  Foutse Khomh,et al.  An exploratory study of the impact of antipatterns on class change- and fault-proneness , 2011, Empirical Software Engineering.

[21]  Baldoino Fonseca dos Santos Neto,et al.  Software Metrics and Security Vulnerabilities: Dataset and Exploratory Study , 2016, 2016 12th European Dependable Computing Conference (EDCC).

[22]  Agostino Di Ciaccio,et al.  Computational Statistics and Data Analysis Measuring the Prediction Error. a Comparison of Cross-validation, Bootstrap and Covariance Penalty Methods , 2022 .

[23]  Alberto Regattieri,et al.  On the use of machine learning methods to predict component reliability from data-driven industrial case studies , 2017, The International Journal of Advanced Manufacturing Technology.

[24]  Carl E. Landwehr,et al.  Basic concepts and taxonomy of dependable and secure computing , 2004, IEEE Transactions on Dependable and Secure Computing.

[25]  Fernando Nogueira,et al.  Imbalanced-learn: A Python Toolbox to Tackle the Curse of Imbalanced Datasets in Machine Learning , 2016, J. Mach. Learn. Res..