Recognizing Faults in Software Related Difficult Data

In this paper we have investigated the use of numerous machine learning algorithms, with emphasis on multilayer artificial neural networks in the domain of software source code fault prediction. The main contribution lies in enhancing the data pre-processing step as the partial solution for handling software related difficult data. Before we put the data into an Artificial Neural Network, we are implementing PCA (Principal Component Analysis) and k-means clustering. The data-clustering step improves the quality of the whole dataset. Using the presented approach we were able to obtain 10% increase of accuracy of the fault detection. In order to ensure the most reliable results, we implement 10-fold cross-validation methodology during experiments. We have also evaluated a wide range of hyperparameter setups for the network, and compared the results to the state of the art, cost-sensitive approaches - Random Forest, AdaBoost, RepTrees and GBT.

[1]  Witold Pedrycz,et al.  Practical Employment of Granular Computing to Complex Application Layer Cyberattack Detection , 2019, Complex..

[2]  Witold Pedrycz,et al.  A comparative analysis of the efficiency of change metrics and static code attributes for defect prediction , 2008, 2008 ACM/IEEE 30th International Conference on Software Engineering.

[3]  Michal Choras,et al.  Q-Rapids framework for advanced data analysis to improve rapid software development , 2018, Journal of Ambient Intelligence and Humanized Computing.

[4]  Michal Choras,et al.  Solution to Data Imbalance Problem in Application Layer Anomaly Detection Systems , 2016, HAIS.

[5]  Michele Lanza,et al.  An extensive comparison of bug prediction approaches , 2010, 2010 7th IEEE Working Conference on Mining Software Repositories (MSR 2010).

[6]  Nitesh V. Chawla,et al.  SMOTE: Synthetic Minority Over-sampling Technique , 2002, J. Artif. Intell. Res..

[7]  Jung-Hua Lo,et al.  The implementation of artificial neural networks applying to software reliability modeling , 2009, 2009 Chinese Control and Decision Conference.

[8]  Jaroslaw Stepaniuk,et al.  Granular Computing and Parameters Tuning in Imbalanced Data Preprocessing , 2018, CISIM.

[9]  Lior Rokach,et al.  Data Mining and Knowledge Discovery Handbook, 2nd ed , 2010, Data Mining and Knowledge Discovery Handbook, 2nd ed..

[10]  Michal Choras,et al.  A Practical Framework and Guidelines to Enhance Cyber Security and Privacy , 2015, CISIS-ICEUTE.

[11]  Michal Choras,et al.  Recent Granular Computing Implementations and its Feasibility in Cybersecurity Domain , 2018, ARES.

[12]  Michal Choras,et al.  Increasing product owners’ cognition and decision-making capabilities by data analysis approach , 2018, Cognition, Technology & Work.