Cold-Start Software Analytics

Software project artifacts such as source code, requirements, and change logs represent a gold-mine of actionable information. As a result, software analytic solutions have been developed to mine repositories and answer questions such as "who is the expert?,'' "which classes are fault prone?,'' or even "who are the domain experts for these fault-prone classes?'' Analytics often require training and configuring in order to maximize performance within the context of each project. A cold-start problem exists when a function is applied within a project context without first configuring the analytic functions on project-specific data. This scenario exists because of the non-trivial effort necessary to instrument a project environment with candidate tools and algorithms and to empirically evaluate alternate configurations. We address the cold-start problem by comparatively evaluating `best-of-breed' and `profile-driven' solutions, both of which reuse known configurations in new project contexts. We describe and evaluate our approach against 20 project datasets for the three analytic areas of artifact connectivity, fault-prediction, and finding the expert, and show that the best-of-breed approach outperformed the profile-driven approach in all three areas; however, while it delivered acceptable results for artifact connectivity and find the expert, both techniques underperformed for cold-start fault prediction.

[1]  Arvinder Kaur,et al.  Prediction of Fault-Prone Software Modules using Statistical and Machine Learning Methods , 2010 .

[2]  Patrick Mäder,et al.  Software traceability: trends and future directions , 2014, FOSE.

[3]  Victor R. Basili,et al.  A Validation of Object-Oriented Design Metrics as Quality Indicators , 1996, IEEE Trans. Software Eng..

[4]  Gabriele Bavota,et al.  Query-based configuration of text retrieval solutions for software engineering tasks , 2015, ESEC/SIGSOFT FSE.

[5]  Nachiappan Nagappan,et al.  Predicting defects using network analysis on dependency graphs , 2008, 2008 ACM/IEEE 30th International Conference on Software Engineering.

[6]  Diomidis Spinellis,et al.  Tool Writing: A Forgotten Art? , 2005, IEEE Softw..

[7]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[8]  Hridesh Rajan,et al.  Boa: A language and infrastructure for analyzing ultra-large-scale software repositories , 2013, 2013 35th International Conference on Software Engineering (ICSE).

[9]  Tao Zhou,et al.  Solving the cold-start problem in recommender systems with social tags , 2010 .

[10]  Chris F. Kemerer,et al.  A Metrics Suite for Object Oriented Design , 2015, IEEE Trans. Software Eng..

[11]  Andrea De Lucia,et al.  How to effectively use topic models for software engineering tasks? An approach based on Genetic Algorithms , 2013, 2013 35th International Conference on Software Engineering (ICSE).

[12]  Gerardo Canfora,et al.  Empirical Principles and an Industrial Case Study in Retrieving Equivalent Requirements via Natural Language Processing Techniques , 2013, IEEE Transactions on Software Engineering.

[13]  Yue Jiang,et al.  Fault Prediction using Early Lifecycle Data , 2007, The 18th IEEE International Symposium on Software Reliability (ISSRE '07).

[14]  Steffen Herbold,et al.  Training data selection for cross-project defect prediction , 2013, PROMISE.

[15]  Andreas Zeller,et al.  Predicting faults from cached history , 2008, ISEC '08.

[16]  Harvey P. Siy,et al.  Predicting Fault Incidence Using Software Change History , 2000, IEEE Trans. Software Eng..

[17]  Stathes Hadjiefthymiades,et al.  Facing the cold start problem in recommender systems , 2014, Expert Syst. Appl..

[18]  Seung-won Hwang,et al.  CosTriage: A Cost-Aware Triage Algorithm for Bug Reporting Systems , 2011, AAAI.

[19]  Evan Moritz,et al.  TraceLab: An experimental workbench for equipping researchers to innovate, synthesize, and comparatively evaluate traceability solutions , 2012, 2012 34th International Conference on Software Engineering (ICSE).

[20]  Audris Mockus,et al.  Expertise Browser: a quantitative approach to identifying expertise , 2002, Proceedings of the 24th International Conference on Software Engineering. ICSE 2002.

[21]  Jane Cleland-Huang,et al.  Improving trace accuracy through data-driven configuration and composition of tracing features , 2013, ESEC/FSE 2013.

[22]  Alain April,et al.  REquirements TRacing On target (RETRO): improving software maintenance through traceability recovery , 2007, Innovations in Systems and Software Engineering.

[23]  A. Zeller,et al.  Predicting Defects for Eclipse , 2007, Third International Workshop on Predictor Models in Software Engineering (PROMISE'07: ICSE Workshops 2007).

[24]  Jane Huffman Hayes,et al.  Advancing candidate link generation for requirements tracing: the study of methods , 2006, IEEE Transactions on Software Engineering.

[25]  Andreas Classen,et al.  Introducing TVL, a Text-based Feature Modelling Language , 2010, VaMoS' 2010.

[26]  Harald C. Gall,et al.  Cross-project defect prediction: a large scale experiment on data vs. domain vs. process , 2009, ESEC/SIGSOFT FSE.

[27]  Anas N. Al-Rabadi,et al.  A comparison of modified reconstructability analysis and Ashenhurst‐Curtis decomposition of Boolean functions , 2004 .

[28]  Elaine J. Weyuker,et al.  A Tool for Mining Defect-Tracking Systems to Predict Fault-Prone Files , 2004, MSR.

[29]  Yue Jiang,et al.  Comparing design and code metrics for software quality prediction , 2008, PROMISE '08.

[30]  Karim O. Elish,et al.  Predicting defect-prone software modules using support vector machines , 2008, J. Syst. Softw..

[31]  David M. Pennock,et al.  Categories and Subject Descriptors , 2001 .

[32]  Maurice H. Halstead,et al.  Elements of software science (Operating and programming systems series) , 1977 .

[33]  Rafael Valencia-García,et al.  Solving the cold-start problem in recommender systems with social tags , 2010, Expert Syst. Appl..

[34]  Yann-Gaël Guéhéneuc,et al.  Combining Probabilistic Ranking and Latent Semantic Indexing for Feature Identification , 2006, 14th IEEE International Conference on Program Comprehension (ICPC'06).

[35]  Walid Maalej,et al.  How Do Users Like This Feature? A Fine Grained Sentiment Analysis of App Reviews , 2014, 2014 IEEE 22nd International Requirements Engineering Conference (RE).

[36]  Bogdan Dit,et al.  Supporting and Accelerating Reproducible Research in Software Maintenance Using TraceLab Component Library , 2013, 2013 IEEE International Conference on Software Maintenance.

[37]  Jane Cleland-Huang,et al.  Guidelines for Benchmarking Automated Software Traceability Techniques , 2015, 2015 IEEE/ACM 8th International Symposium on Software and Systems Traceability.

[38]  Kevin Leyton-Brown,et al.  Auto-WEKA: combined selection and hyperparameter optimization of classification algorithms , 2012, KDD.

[39]  N. Nagappan,et al.  Use of relative code churn measures to predict system defect density , 2005, Proceedings. 27th International Conference on Software Engineering, 2005. ICSE 2005..

[40]  Letha H. Etzkorn,et al.  Configuring latent Dirichlet allocation based feature location , 2014, Empirical Software Engineering.

[41]  Gail C. Murphy,et al.  Who should fix this bug? , 2006, ICSE.

[42]  Gerardo Canfora,et al.  Multi-objective Cross-Project Defect Prediction , 2013, 2013 IEEE Sixth International Conference on Software Testing, Verification and Validation.

[43]  Martin Hitz,et al.  Chidamber & Kemerer's Metrics Suite: a Measurement Theory Perspective , 1996 .

[44]  Franz Wotawa,et al.  Automatic Software Bug Triage System (BTS) Based on Latent Semantic Indexing and Support Vector Machine , 2009, 2009 Fourth International Conference on Software Engineering Advances.

[45]  Jane Cleland-Huang,et al.  Enhancing Stakeholder Profiles to Improve Recommendations in Online Requirements Elicitation , 2009, 2009 17th IEEE International Requirements Engineering Conference.

[46]  Bojan Cukic,et al.  Robust prediction of fault-proneness by random forests , 2004, 15th International Symposium on Software Reliability Engineering.

[47]  Gail C. Murphy,et al.  Automatic bug triage using text categorization , 2004, SEKE.