Cross-Project Aging-Related Bug Prediction Based on Joint Distribution Adaptation and Improved Subclass Discriminant Analysis

Software aging, which is caused by Aging-Related Bugs (ARBs), refers to the phenomenon of performance degradation and eventual crash in long running systems. In order to discover and remove ARBs, ARB prediction is proposed. However, due to the low presence and reproducing difficulty of ARBs, it is usually difficult to collect sufficient ARB data within a project. Therefore, cross-project ARB prediction is proposed as a solution to build the target project’s ARB predictor by using the labeled data from the source project. A key point for cross-project ARB prediction is to reduce distribution difference between source and target project. However, existing approaches mainly focus on the marginal distribution difference while somehow overlook the conditional distribution difference, and they mainly use random oversampling to alleviate the class imbalance which may lead to overfitting. To address these problems, we propose a new crossproject ARB prediction approach based on Joint Distribution Adaptation (JDA) and Improved Subclass Discriminant Analysis (ISDA), called JDA-ISDA. The key idea of JDA-ISDA is first to use JDA to reduce the marginal distribution and conditional distribution difference jointly and then apply ISDA to alleviate the severe class imbalance problem. A set of experiments are carried out on two large open-source projects with six different machine learning (ML) classifiers. The experimental results demonstrate that compared with the state-of-the-art Transfer Learning based Aging-related bug Prediction (TLAP) and Supervised Representation Learning Approach (SRLA), JDA-ISDA is much more robust to different ML classifiers than TLAP, and the average improvement in terms of the balance value can be achieved up to 31.8%, and JDA-ISDA also outperforms TLAP and SRLA on average when logistic regression is chosen as the classifier for best performance prediction.

[1]  E Marshall,et al.  Fatal error: how patriot overlooked a scud. , 1992, Science.

[2]  Rivalino Matias,et al.  An Experimental Study on Software Aging and Rejuvenation in Web Servers , 2006, 30th Annual International Computer Software and Applications Conference (COMPSAC'06).

[3]  Qiang Yang,et al.  Distant Domain Transfer Learning , 2017, AAAI.

[4]  Kishor S. Trivedi,et al.  A comprehensive model for software rejuvenation , 2005, IEEE Transactions on Dependable and Secure Computing.

[5]  Yiqiang Chen,et al.  Balanced Distribution Adaptation for Transfer Learning , 2017, 2017 IEEE International Conference on Data Mining (ICDM).

[6]  Baowen Xu,et al.  An Improved SDA Based Defect Prediction Framework for Both Within-Project and Cross-Project Class-Imbalance Problems , 2017, IEEE Transactions on Software Engineering.

[7]  Chenggang Bai,et al.  Cross-Project Aging Related Bug Prediction , 2015, 2015 IEEE International Conference on Software Quality, Reliability and Security.

[8]  Ivor W. Tsang,et al.  Domain Transfer Multiple Kernel Learning , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[9]  Kishor S. Trivedi,et al.  Supervised Representation Learning Approach for Cross-Project Aging-Related Bug Prediction , 2019, 2019 IEEE 30th International Symposium on Software Reliability Engineering (ISSRE).

[10]  Domenico Cotroneo,et al.  Software Aging Analysis of the Linux Operating System , 2010, 2010 IEEE 21st International Symposium on Software Reliability Engineering.

[11]  Kishor S. Trivedi,et al.  Studying Aging-Related Bug Prediction Using Cross-Project Models , 2019, IEEE Transactions on Reliability.

[12]  Tim Menzies,et al.  Data Mining Static Code Attributes to Learn Defect Predictors , 2007, IEEE Transactions on Software Engineering.

[13]  Domenico Cotroneo,et al.  Characterizing Aging Phenomena of the Java Virtual Machine , 2007, 2007 26th IEEE International Symposium on Reliable Distributed Systems (SRDS 2007).

[14]  Pedro M. Domingos,et al.  Deep transfer via second-order Markov logic , 2009, ICML '09.

[15]  Chao Liu,et al.  A two-phase transfer learning model for cross-project defect prediction , 2019, Inf. Softw. Technol..

[16]  Tatsuya Harada,et al.  Asymmetric Tri-training for Unsupervised Domain Adaptation , 2017, ICML.

[17]  Ivor W. Tsang,et al.  Domain Adaptation via Transfer Component Analysis , 2009, IEEE Transactions on Neural Networks.

[18]  Bernhard Schölkopf,et al.  Correcting Sample Selection Bias by Unlabeled Data , 2006, NIPS.

[19]  Jianwen Xiang,et al.  Lifetime Extension of Software Execution Subject to Aging , 2017, IEEE Transactions on Reliability.

[20]  Philip S. Yu,et al.  Transfer Feature Learning with Joint Distribution Adaptation , 2013, 2013 IEEE International Conference on Computer Vision.

[21]  David Lo,et al.  HYDRA: Massively Compositional Model for Cross-Project Defect Prediction , 2016, IEEE Transactions on Software Engineering.

[22]  Qiang Yang,et al.  A Survey on Transfer Learning , 2010, IEEE Transactions on Knowledge and Data Engineering.

[23]  Xin Yao,et al.  Using Class Imbalance Learning for Software Defect Prediction , 2013, IEEE Transactions on Reliability.

[24]  Beibei Yin,et al.  An empirical study of factors affecting cross-project aging-related bug prediction with TLAP , 2019, Software Quality Journal.

[25]  Norman E. Fenton,et al.  A Critique of Software Defect Prediction Models , 1999, IEEE Trans. Software Eng..

[26]  Yennun Huang,et al.  Software rejuvenation: analysis, module and applications , 1995, Twenty-Fifth International Symposium on Fault-Tolerant Computing. Digest of Papers.

[27]  Philip S. Yu,et al.  Stratified Transfer Learning for Cross-domain Activity Recognition , 2017, 2018 IEEE International Conference on Pervasive Computing and Communications (PerCom).

[28]  Lu Lu,et al.  Joint distribution matching model for distribution-adaptation-based cross-project defect prediction , 2019, IET Softw..

[29]  Dimitris Kanellopoulos,et al.  Data Preprocessing for Supervised Leaning , 2007 .

[30]  Domenico Cotroneo,et al.  Is software aging related to software metrics? , 2010, 2010 IEEE Second International Workshop on Software Aging and Rejuvenation.

[31]  Kishor S. Trivedi,et al.  Analysis of Software Aging in a Web Server , 2006, IEEE Transactions on Reliability.

[32]  Aleix M. Martínez,et al.  Subclass discriminant analysis , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[33]  Domenico Cotroneo,et al.  Predicting aging-related bugs using software complexity metrics , 2013, Perform. Evaluation.

[34]  Jongmoon Baik,et al.  A transfer cost-sensitive boosting approach for cross-project defect prediction , 2017, Software Quality Journal.

[35]  Fuzhen Zhuang,et al.  Supervised Representation Learning with Double Encoding-Layer Autoencoder for Transfer Learning , 2017, ACM Trans. Intell. Syst. Technol..

[36]  Elaine J. Weyuker,et al.  Predicting the location and number of faults in large software systems , 2005, IEEE Transactions on Software Engineering.

[37]  Kishor S. Trivedi,et al.  A methodology for detection and estimation of software aging , 1998, Proceedings Ninth International Symposium on Software Reliability Engineering (Cat. No.98TB100257).

[38]  Sinno Jialin Pan,et al.  Transfer defect learning , 2013, 2013 35th International Conference on Software Engineering (ICSE).

[39]  Domenico Cotroneo,et al.  A survey of software aging and rejuvenation studies , 2014, ACM J. Emerg. Technol. Comput. Syst..

[40]  Dongdong Zhao,et al.  Analysis of Software Aging in Android , 2016, 2016 IEEE International Symposium on Software Reliability Engineering Workshops (ISSREW).

[41]  ZhangHongyu,et al.  Comments on "Data Mining Static Code Attributes to Learn Defect Predictors" , 2007 .

[42]  Philip S. Yu,et al.  Deep Learning of Transferable Representation for Scalable Domain Adaptation , 2016, IEEE Transactions on Knowledge and Data Engineering.