Cost-sensitive and ensemble-based prediction model for outsourced software project risk prediction

Nowadays software is mainly developed through outsourcing and it has become one of the most important business practice strategies for the software industry. However, outsourcing projects are often affiliated with high failure rate. Therefore to ensure success in outsourcing projects, past research has aimed to develop intelligent risk prediction models to evaluate the success rate and cost-effectiveness of software projects. In this study, we first summarized related work over the past 20years and observed that all existing prediction models assume equal misclassification costs, neglecting actual situations in the management of software projects. In fact, overlooking project failure is far more serious than the misclassification of a success-prone project as a failure. Moreover, ensemble learning, a technique well-recognized to improve prediction performance in other fields, has not yet been comprehensively studied in software project risk prediction. This study aims to close the research gaps by exploring cost-sensitive analysis and classifier ensemble methods. Comparative analysis with T-test on 60 different risk prediction models using 327 outsourced software project samples suggests that the ideal model is a homogeneous ensemble model of decision trees (DT) based on bagging. Interestingly, DT underperformed Support Vector Machine (SVM) in accuracy (i.e., assuming equal misclassification cost), but outperformed in cost-sensitive analysis under the proposed framework. In conclusion, this study proposes the first cost-sensitive and ensemble-based hybrid modeling framework (COSENS) for software project risk prediction. In addition, it establishes a new rigorous evaluation standard for assessing software risk prediction models by considering misclassification costs. Display Omitted The first cost-sensitive and ensemble framework to predict software project riskA comprehensive T-test method was used for rigorous performance comparison.A total of 60 models were built and compared based on 327 real project samples.Decision tree underperformed SVM in accuracy, but outperformed in cost analysis.A new rigorous model standard for software project risk analysis is established.

[1]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[2]  Stefan Koch,et al.  Software project effort estimation with voting rules , 2009, Decis. Support Syst..

[3]  Pedro M. Domingos,et al.  How to Get a Free Lunch: A Simple Cost Model for Machine Learning Applications , 1998 .

[4]  Thomas G. Dietterich Multiple Classifier Systems , 2000, Lecture Notes in Computer Science.

[5]  Qiang Yang,et al.  Extracting Actionable Knowledge from Decision Trees , 2007, IEEE Transactions on Knowledge and Data Engineering.

[6]  Taghi M. Khoshgoftaar,et al.  The use of decision trees for cost‐sensitive classification: an empirical study in software quality prediction , 2011, Wiley Interdiscip. Rev. Data Min. Knowl. Discov..

[7]  Alfredo Candia-Véjar,et al.  The optimization of success probability for software projects using genetic algorithms , 2011, J. Syst. Softw..

[8]  Qiang Yang,et al.  Test strategies for cost-sensitive decision trees , 2006, IEEE Transactions on Knowledge and Data Engineering.

[9]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[10]  June M. Verner,et al.  State of the practice: An exploratory analysis of schedule estimation and software project success prediction , 2007, Inf. Softw. Technol..

[11]  Li Xiu,et al.  Application of data mining techniques in customer relationship management: A literature review and classification , 2009, Expert Syst. Appl..

[12]  Norman E. Fenton,et al.  Improved Bayesian Networks for Software Project Risk Assessment Using Dynamic Discretisation , 2006, SET.

[13]  Halil Ibrahim Erdal,et al.  Advancing monthly streamflow prediction accuracy of CART models using ensemble learning paradigms , 2013 .

[14]  D. Signorini,et al.  Neural networks , 1995, The Lancet.

[15]  Wooju Kim,et al.  Combination of multiple classifiers for the customer's purchase behavior prediction , 2003, Decis. Support Syst..

[16]  David West,et al.  Neural network ensemble strategies for financial decision applications , 2005, Comput. Oper. Res..

[17]  Taghi M. Khoshgoftaar,et al.  Application of fuzzy expert systems in assessing operational risk of software , 2003, Inf. Softw. Technol..

[18]  Sousuke Amasaki,et al.  Characterization of Runaway Software Projects Using Association Rule Mining , 2006, PROFES.

[19]  Charles Elkan,et al.  The Foundations of Cost-Sensitive Learning , 2001, IJCAI.

[20]  Kalle Lyytinen,et al.  Identifying Software Project Risks: An International Delphi Study , 2001, J. Manag. Inf. Syst..

[21]  Thomas G. Dietterich An Experimental Comparison of Three Methods for Constructing Ensembles of Decision Trees: Bagging, Boosting, and Randomization , 2000, Machine Learning.

[22]  A.K.T. Hui,et al.  A Bayesian belief network model and tool to evaluate risk and impact in software development projects , 2004, Annual Symposium Reliability and Maintainability, 2004 - RAMS.

[23]  Jun Zheng,et al.  Cost-sensitive boosting neural networks for software defect prediction , 2010, Expert Syst. Appl..

[24]  Sungjoo Lee,et al.  A prediction model for success of services in e-commerce using decision tree: E-customer's attitude towards online service , 2007, Expert Syst. Appl..

[25]  Yasunari Takagi,et al.  An Empirical Approach to Characterizing Risky Software Projects Based on Logistic Regression Analysis , 2005, Empirical Software Engineering.

[26]  Yue Jiang,et al.  Misclassification cost-sensitive fault prediction models , 2009, PROMISE '09.

[27]  Peter D. Turney Types of Cost in Inductive Concept Learning , 2002, ArXiv.

[28]  Taghi M. Khoshgoftaar,et al.  The Detection of Fault-Prone Programs , 1992, IEEE Trans. Software Eng..

[29]  Capers Jones,et al.  Assessment and control of software risks , 1994, Yourdon Press Computing Series.

[30]  Taghi M. Khoshgoftaar,et al.  Cost-sensitive boosting in software quality modeling , 2002, 7th IEEE International Symposium on High Assurance Systems Engineering, 2002. Proceedings..

[31]  Yong Hu,et al.  Software project risk analysis using Bayesian networks with causality constraints , 2013, Decis. Support Syst..

[32]  David C. Yen,et al.  Predicting stock returns by classifier ensembles , 2011, Appl. Soft Comput..

[33]  Yong Hu,et al.  A scalable intelligent non-content-based spam-filtering framework , 2010, Expert Syst. Appl..

[34]  Paul L. Bannerman,et al.  Risk and risk management in software projects: A reassessment , 2008, J. Syst. Softw..

[35]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[36]  S. J. Press,et al.  Choosing between Logistic Regression and Discriminant Analysis , 1978 .

[37]  Honggang Wang,et al.  Empirical Evaluation of Classifiers for Software Risk Management , 2009, Int. J. Inf. Technol. Decis. Mak..

[38]  Trevor Wood-Harper,et al.  Understanding the Sources of Information Systems Project Failure , 2007 .

[39]  Imran Siwani,et al.  'fuzzy ProjectManager' — FRAMEWORK FOR SOFTWARE PROJECT MANAGEMENT USING FUZZY LOGIC , 2004 .

[40]  Bruce W. Schmeiser,et al.  Optimal linear combinations of neural networks: an overview , 1994, Proceedings of 1994 IEEE International Conference on Neural Networks (ICNN'94).

[41]  Bart Baesens,et al.  Benchmarking Classification Models for Software Defect Prediction: A Proposed Framework and Novel Findings , 2008, IEEE Transactions on Software Engineering.

[42]  Yiu-Wing Leung,et al.  Maximum likelihood voting for fault-tolerant software with finite output-space , 1995 .

[43]  Weidong Xia,et al.  Complexity of Information Systems Development Projects: Conceptualization and Measurement Development , 2005, J. Manag. Inf. Syst..

[44]  John Mingers,et al.  Neural Networks, Decision Tree Induction and Discriminant Analysis: an Empirical Comparison , 1994 .

[45]  Robert E. Schapire,et al.  The strength of weak learnability , 1990, Mach. Learn..

[46]  Lior Rokach,et al.  Ensemble-based classifiers , 2010, Artificial Intelligence Review.

[47]  Jianfeng Du,et al.  Intelligent Analysis Model for Outsourced Software Project Risk Using Constraint-based Bayesian Network , 2012, J. Softw..

[48]  Deepanshu Sharma,et al.  Software Project Health Analysis : Prediction of Outcome at Initial Stage , 2013 .

[49]  D. Burrows,et al.  Determination of Confidence Limits for Experiments with Low Numbers of Counts , 1991 .

[50]  Suzanne Rivard,et al.  Toward an Assessment of Software Development Risk , 1993, J. Manag. Inf. Syst..

[51]  Richard A. Johnson,et al.  Applied Multivariate Statistical Analysis , 1983 .

[52]  Mark Keil,et al.  Software project risks and their effect on outcomes , 2004, CACM.

[53]  Mark Keil,et al.  Predicting information technology project escalation: A neural network approach , 2003, Eur. J. Oper. Res..

[54]  Barry W. Boehm,et al.  Software Engineering Economics , 1993, IEEE Transactions on Software Engineering.

[55]  Sarma R. Nidumolu The Effect of Coordination and Uncertainty on Software Project Performance: Residual Performance Risk as an Intervening Variable , 1995, Inf. Syst. Res..

[56]  R. Polikar,et al.  Ensemble based systems in decision making , 2006, IEEE Circuits and Systems Magazine.

[57]  Yong Hu,et al.  Software Project Risk Management Modeling with Neural Network and Support Vector Machine Approaches , 2007, Third International Conference on Natural Computation (ICNC 2007).

[58]  Gary Klein,et al.  An exploration of the relationship between software development process maturity and project performance , 2004, Inf. Manag..

[59]  Victor R. Basili,et al.  Developing Interpretable Models with Optimized Set Reduction for Identifying High-Risk Software Components , 1993, IEEE Trans. Software Eng..

[60]  Jianfeng Du,et al.  An integrative framework for intelligent software project risk planning , 2013, Decis. Support Syst..

[61]  B. Boehm Software risk management: principles and practices , 1991, IEEE Software.

[62]  Mark Keil,et al.  Understanding software project risk: a cluster analysis , 2004, Inf. Manag..

[63]  Isabel M. Ramos,et al.  An evolutionary approach to estimating software development projects , 2001, Inf. Softw. Technol..

[64]  Dale Karolak,et al.  Software engineering risk management , 1995 .

[65]  Luisa Micó,et al.  Comparison of Classifier Fusion Methods for Classification in Pattern Recognition Tasks , 2006, SSPR/SPR.

[66]  D. Opitz,et al.  Popular Ensemble Methods: An Empirical Study , 1999, J. Artif. Intell. Res..

[67]  Alberto Maria Segre,et al.  Programs for Machine Learning , 1994 .

[68]  Nuno Vasconcelos,et al.  Cost-Sensitive Boosting , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[69]  June M. Verner,et al.  Toward predicting software development success from the perspective of practitioners: an exploratory Bayesian model , 2005, J. Inf. Technol..