Multi-metric prediction of software build outcomes

ness (A) Derived from overall project or for each project package Afferent Coupling (Ca) Where NoICR is the Number of Internal Class References and NoECR is the Number of External Class References. Derived from overall project or for each project package or at class (type) level. An internal class reference is where a class references another class within its own package. An external class reference is where a class references another class outside of its own package. Efferent Coupling (Ce) Total number of times a class makes references outside its own package Derived from overall project or for each project package or at class (type) level. Instability (I) Where Ce is Efferent coupling and Ca is Afferent Coupling. Derived from overall project or for each project package or at class (type) level. Normalized Distance Derived from overall project or for each project package or at class (type) level. This metric provides us data about if packages are balanced and has a good mix between abstraction and instability. Complexity metrics (Table 38) indicate the complexity of the software and can be used towards design effort and software maintenance estimation. Developed by McCabe in 1976, cyclomatic complexity measures the number of possible independent pathways through source code. In order to do this a control flow graph of the program is generated, where nodes within the graph represent a source code command, or sequence of commands and an edge that connects nodes represents the possible order of execution. For example if the source code consists of an "IF" statement of a single condition there are two possible pathways the code could be executed. One pathway is taken if the resulting value of the "IF" statement is TRUE and the other pathway if the 230 value is FALSE. Whereas if the code consists of no loops or "IF" statements, there is only one possible pathway the code could be executed, therefore there would only be one pathway for execution. Table 38 Complexity Metrics Average Block Depth Where NoM is the number of methods, NoConstr is the number of constructors and NoNC is the number of nested code Derived from overall project or for each project package, class or method. Try/catch are included in the nested count. Value is 1 if the code is not nested. High average block depth values are considered detrimental. Average Cyclomatic Complexity Where CC is the Cyclomatic Complexity. Cyclomatic complexity is calculated by the number of pathways that can be taken within the code. Where NoE is the number of edges and NoN is the number of nodes. NoConstr is the number of constructors Derived from overall project or for each project package, class or method. Cyclomatic complexity per method is the total number of pathways that the code within that method can take. Cohesion metrics provide insights into the responsibilities and design of software modules. High cohesion values indicate the reusability and readability of source code.

[1]  Albert Bifet,et al.  Adaptive learning and mining for data streams and frequent patterns , 2009, SKDD.

[2]  Ade Miller,et al.  A Hundred Days of Continuous Integration , 2008, Agile 2008 Conference.

[3]  Foutse Khomh,et al.  An Exploratory Study of the Impact of Code Smells on Software Change-proneness , 2009, 2009 16th Working Conference on Reverse Engineering.

[4]  Mary E. Helander,et al.  Jazz as a research platform: experience from the Software Development Governance Group at IBM Research , 2008 .

[5]  Kathleen M. Swigger,et al.  Exploring Collaboration Patterns among Global Software Development Teams , 2009, 2009 Fourth IEEE International Conference on Global Software Engineering.

[6]  Sandro Morasca,et al.  Deriving models of software fault-proneness , 2002, SEKE '02.

[7]  Taghi M. Khoshgoftaar,et al.  Predicting Software Development Errors Using Software Complexity Metrics , 1990, IEEE J. Sel. Areas Commun..

[8]  Mohamed Medhat Gaber,et al.  A fuzzy approach for interpretation of ubiquitous data stream clustering and its application in road safety , 2007, Intell. Data Anal..

[9]  June M. Verner,et al.  Why did your project fail? , 2009, Commun. ACM.

[10]  Jonathan I. Maletic,et al.  Journal of Software Maintenance and Evolution: Research and Practice Survey a Survey and Taxonomy of Approaches for Mining Software Repositories in the Context of Software Evolution , 2022 .

[11]  Shonali Krishnaswamy,et al.  Mining data streams: a review , 2005, SGMD.

[12]  Mark Harman,et al.  The Current State and Future of Search Based Software Engineering , 2007, Future of Software Engineering (FOSE '07).

[13]  Gregorio Robles,et al.  Effort estimation by characterizing developer activity , 2006, EDSER '06.

[14]  James D. Herbsleb,et al.  Guest Editors' Introduction: Global Software Development , 2001, IEEE Softw..

[15]  David Jensen,et al.  Data Mining in Social Networks , 2002 .

[16]  A. Bifet,et al.  Early Drift Detection Method , 2005 .

[17]  Taghi M. Khoshgoftaar,et al.  Comparative Assessment of Software Quality Classification Techniques: An Empirical Case Study , 2004, Empirical Software Engineering.

[18]  Erhard Rahm,et al.  Data Cleaning: Problems and Current Approaches , 2000, IEEE Data Eng. Bull..

[19]  Romain Robbes Mining a Change-Based Software Repository , 2007, Fourth International Workshop on Mining Software Repositories (MSR'07:ICSE Workshops 2007).

[20]  Magne Jørgensen,et al.  A Systematic Review of Software Development Cost Estimation Studies , 2007 .

[21]  Meir M. Lehman,et al.  A Model of Large Program Development , 1976, IBM Syst. J..

[22]  Sunday O. Olatunji,et al.  Mining Software Repositories - A Comparative Analysis , 2010 .

[23]  Alexander Serebrenik,et al.  Process Mining Software Repositories , 2011, 2011 15th European Conference on Software Maintenance and Reengineering.

[24]  Nitesh V. Chawla,et al.  Data Mining for Imbalanced Datasets: An Overview , 2005, The Data Mining and Knowledge Discovery Handbook.

[25]  Tong-Seng Quah,et al.  Application of neural networks for software quality prediction using object-oriented metrics , 2005, J. Syst. Softw..

[26]  Jane Cleland-Huang,et al.  Supporting software evolution through dynamically retrieving traces to UML artifacts , 2004 .

[27]  Magne Jørgensen,et al.  Characteristics of software engineers with optimistic predictions , 2007, J. Syst. Softw..

[28]  Jiawei Han,et al.  Data Mining for Web Intelligence , 2002, Computer.

[29]  Rudolf Ramler,et al.  Issues and effort in integrating data from heterogeneous software repositories and corporate databases , 2008, ESEM '08.

[30]  Geoff Hulten,et al.  Mining time-changing data streams , 2001, KDD '01.

[31]  David A. Cieslak,et al.  Automatically countering imbalance and its empirical relationship to cost , 2008, Data Mining and Knowledge Discovery.

[32]  Nitesh V. Chawla,et al.  SMOTE: Synthetic Minority Over-sampling Technique , 2002, J. Artif. Intell. Res..

[33]  Nan Jiang,et al.  Research issues in data stream association rule mining , 2006, SGMD.

[34]  N Leena.,et al.  SOFTWARE COST ESTIMATION - A CASE STUDY , 2013 .

[35]  Aurora Vizcaíno,et al.  Collaboration Tools for Global Software Engineering , 2010, IEEE Software.

[36]  Martin Bichler,et al.  Design science in information systems research , 2006, Wirtschaftsinf..

[37]  Thomas Zimmermann,et al.  Preprocessing CVS Data for Fine-Grained Analysis , 2004, MSR.

[38]  KYOCERA HYDRO ELITE Software update , 1999, Molecular biotechnology.

[39]  Peter J. Denning,et al.  The profession of ITIs software engineering engineering? , 2009, CACM.

[40]  Daniela E. Damian,et al.  Information Brokers in Requirement-Dependency Social Networks , 2008, 2008 16th IEEE International Requirements Engineering Conference.

[41]  Parag C. Pendharkar,et al.  A probabilistic model for predicting software development effort , 2003, IEEE Transactions on Software Engineering.

[42]  Robert C. Martin,et al.  OO Design Quality Metrics , 1997 .

[43]  Andreas Zeller,et al.  Mining metrics to predict component failures , 2006, ICSE.

[44]  Rongxin Wu,et al.  Dealing with noise in defect prediction , 2011, 2011 33rd International Conference on Software Engineering (ICSE).

[45]  Kevin Crowston,et al.  The Perils and Pitfalls of Mining SourceForge , 2004, MSR.

[46]  Andrew Begel,et al.  Codebook: Social networking over code , 2009, 2009 31st International Conference on Software Engineering - Companion Volume.

[47]  Erran Carmel,et al.  Tactical Approaches for Alleviating Distance in Global Software Development , 2001, IEEE Softw..

[48]  Silvio Romero de Lemos Meira,et al.  Bagging Predictors for Estimation of Software Project Effort , 2007, 2007 International Joint Conference on Neural Networks.

[49]  Doo-Hwan Bae,et al.  A cohesion measure for object‐oriented classes , 2000 .

[50]  Huan Liu,et al.  Toward integrating feature selection algorithms for classification and clustering , 2005, IEEE Transactions on Knowledge and Data Engineering.

[51]  Thomas Seidl,et al.  MOA: A Real-Time Analytics Open Source Framework , 2011, ECML/PKDD.

[52]  Andreas Zeller,et al.  Mining version histories to guide software changes , 2005, Proceedings. 26th International Conference on Software Engineering.

[53]  Timos K. Sellis,et al.  Window Specification over Data Streams , 2006, EDBT Workshops.

[54]  Norman E. Fenton,et al.  A Critique of Software Defect Prediction Models , 1999, IEEE Trans. Software Eng..

[55]  Jennifer Widom,et al.  Continuous queries over data streams , 2001, SGMD.

[56]  Diomidis Spinellis Bug busters , 2006, IEEE Software.

[57]  Burak Turhan,et al.  Implications of ceiling effects in defect predictors , 2008, PROMISE '08.

[58]  Lior Rokach,et al.  Data Mining And Knowledge Discovery Handbook , 2005 .

[59]  Albert Bifet,et al.  Sentiment Knowledge Discovery in Twitter Streaming Data , 2010, Discovery Science.

[60]  Barry W. Boehm,et al.  Software Engineering Economics , 1993, IEEE Transactions on Software Engineering.

[61]  Manish Agrawal,et al.  Software Effort, Quality, and Cycle Time: A Study of CMM Level 5 Projects , 2007, IEEE Transactions on Software Engineering.

[62]  M. Lindvall,et al.  Knowledge management in software engineering , 2002, IEEE Software.

[63]  Ramanath Subramanyam,et al.  Empirical Analysis of CK Metrics for Object-Oriented Design Complexity: Implications for Software Defects , 2003, IEEE Trans. Software Eng..

[64]  A.E. Hassan,et al.  The road ahead for Mining Software Repositories , 2008, 2008 Frontiers of Software Maintenance.

[65]  Christos Faloutsos,et al.  Detecting Fraudulent Personalities in Networks of Online Auctioneers , 2006, PKDD.

[66]  Bart Baesens,et al.  Mining software repositories for comprehensible software fault prediction models , 2008, J. Syst. Softw..

[67]  Norman E. Fenton,et al.  Software metrics: roadmap , 2000, ICSE '00.

[68]  Magne Jørgensen,et al.  A Systematic Review of Software Development Cost Estimation Studies , 2007, IEEE Transactions on Software Engineering.

[69]  Shari Lawrence Pfleeger,et al.  Albert Einstein and Empirical Software Engineering , 1999, Computer.

[70]  Audris Mockus,et al.  An empirical study of global software development: distance and speed , 2001, Proceedings of the 23rd International Conference on Software Engineering. ICSE 2001.

[71]  Christopher J. Lokan,et al.  What should you optimize when building an estimation model? , 2005, 11th IEEE International Software Metrics Symposium (METRICS'05).

[72]  Michael Stonebraker,et al.  The 8 requirements of real-time stream processing , 2005, SGMD.

[73]  Stephen G. MacDonell,et al.  A comparison of techniques for developing predictive models of software metrics , 1997, Inf. Softw. Technol..

[74]  Mark Harman,et al.  Pareto optimal search based refactoring at the design level , 2007, GECCO '07.

[75]  Girish H. Subramanian,et al.  An empirical study of certain object-oriented software metrics , 2001, J. Syst. Softw..

[76]  Audris Mockus,et al.  Socio-technical congruence (STC 2008) , 2008, ICSE Companion '08.

[77]  Audris Mockus,et al.  Predicting risk of software changes , 2000, Bell Labs Technical Journal.

[78]  Jay F. Nunamaker,et al.  Systems Development in Information Systems Research , 1990, J. Manag. Inf. Syst..

[79]  Gabriele Manduchi,et al.  Measuring software evolution at a nuclear fusion experiment site: a test case for the applicability of OO and reuse metrics in software characterization , 2002, Inf. Softw. Technol..

[80]  Geoff Hulten,et al.  Mining high-speed data streams , 2000, KDD '00.

[81]  Ioannis Stamelos,et al.  Software productivity and effort prediction with ordinal regression , 2005, Inf. Softw. Technol..

[82]  Jürgen Münch,et al.  Integrating Human Judgment and Data Analysis to Identify Factors Influencing Software Development Productivity , 2008, e Informatica Softw. Eng. J..

[83]  Leo Breiman,et al.  Classification and Regression Trees , 1984 .

[84]  Rajeev Motwani,et al.  Sampling from a moving window over streaming data , 2002, SODA '02.

[85]  Michael Cowen,et al.  Cognitive Model of Team Collaboration: Macro-Cognitive Focus , 2005 .

[86]  Andreas Zeller,et al.  Mining the Jazz repository: Challenges and opportunities , 2009, 2009 6th IEEE International Working Conference on Mining Software Repositories.

[87]  Stephen G. MacDonell,et al.  Combining techniques to optimize effort predictions in software project management , 2003, J. Syst. Softw..

[88]  Jennifer Widom,et al.  Models and issues in data stream systems , 2002, PODS.

[89]  Andrew Glover,et al.  Continuous Integration: Improving Software Quality and Reducing Risk (The Addison-Wesley Signature Series) , 2007 .

[90]  Taghi M. Khoshgoftaar,et al.  Choosing software metrics for defect prediction: an investigation on feature selection techniques , 2011, Softw. Pract. Exp..

[91]  Taghi M. Khoshgoftaar,et al.  Improving Software-Quality Predictions With Data Sampling and Boosting , 2009, IEEE Transactions on Systems, Man, and Cybernetics - Part A: Systems and Humans.

[92]  Salvatore J. Stolfo,et al.  Real-world Data is Dirty: Data Cleansing and The Merge/Purge Problem , 1998, Data Mining and Knowledge Discovery.

[93]  Christof Ebert,et al.  Surviving Global Software Development , 2001, IEEE Softw..

[94]  Akito Monden,et al.  The Effects of Over and Under Sampling on Fault-prone Module Detection , 2007, ESEM 2007.

[95]  Abraham Kandel,et al.  Data mining in software metrics databases , 2004, Fuzzy Sets Syst..

[96]  Márcio de Oliveira Barros,et al.  Staffing a software project: A constraint satisfaction and optimization-based approach , 2008, Comput. Oper. Res..

[97]  George Forman,et al.  An Extensive Empirical Study of Feature Selection Metrics for Text Classification , 2003, J. Mach. Learn. Res..

[98]  Ahmed E. Hassan,et al.  Mining Software Repositories to Assist Developers and Support Managers , 2006, 2006 22nd IEEE International Conference on Software Maintenance.

[99]  Mica R. Endsley,et al.  Tools for Supporting Team Collaboration , 2003 .

[100]  Taghi M. Khoshgoftaar,et al.  Data Mining of Software Development Databases , 2004, Software Quality Journal.

[101]  Réka Albert,et al.  Using Graph Concepts to Understand the Organization of Complex Systems , 2006, Int. J. Bifurc. Chaos.

[102]  Thomas Zimmermann,et al.  Analytics for software development , 2010, FoSER '10.

[103]  C. Chabris,et al.  Gorillas in Our Midst: Sustained Inattentional Blindness for Dynamic Events , 1999, Perception.

[104]  Adam A. Porter,et al.  Empirical studies of software engineering: a roadmap , 2000, ICSE '00.

[105]  Taghi M. Khoshgoftaar,et al.  Software Quality Classification Modeling Using the SPRINT Decision Tree Algorithm , 2003, Int. J. Artif. Intell. Tools.

[106]  Kenneth A. Kaufman,et al.  From Data Mining to Knowledge Mining , 2005 .

[107]  Ioannis Stamelos,et al.  Regression via Classification applied on software defect estimation , 2008, Expert Syst. Appl..

[108]  Gail C. Murphy,et al.  Predicting source code changes by mining change history , 2004, IEEE Transactions on Software Engineering.

[109]  J. Buffenbarger A Large-Scale Fault-Tolerant Distributed Software-Build Process , 2005 .

[110]  Yann-Gaël Guéhéneuc,et al.  Automatic Generation of Detection Algorithms for Design Defects , 2006, 21st IEEE/ACM International Conference on Automated Software Engineering (ASE'06).

[111]  Tyson R. Browning,et al.  Applying the design structure matrix to system decomposition and integration problems: a review and new directions , 2001, IEEE Trans. Engineering Management.

[112]  Shari Lawrence Pfleeger,et al.  Preliminary Guidelines for Empirical Research in Software Engineering , 2002, IEEE Trans. Software Eng..

[113]  Magne Jørgensen,et al.  Software effort estimation terminology: The tower of Babel , 2006, Inf. Softw. Technol..

[114]  Fabrizio Sebastiani,et al.  Machine learning in automated text categorization , 2001, CSUR.

[115]  E. Salas,et al.  Shared mental models in expert team decision making. , 1993 .

[116]  S. Dick,et al.  Applying Novel Resampling Strategies To Software Defect Prediction , 2007, NAFIPS 2007 - 2007 Annual Meeting of the North American Fuzzy Information Processing Society.

[117]  Roger Guimerà,et al.  Team Assembly Mechanisms Determine Collaboration Network Structure and Team Performance , 2005, Science.

[118]  Hongyan Liu,et al.  Methods for mining frequent items in data streams: an overview , 2009, Knowledge and Information Systems.

[119]  Mohammad Al-Fayoumi,et al.  Analysis of Social Network Using Clever Ant Colony Metaphor , 2009 .

[120]  Victor R. Basili,et al.  A Validation of Object-Oriented Design Metrics as Quality Indicators , 1996, IEEE Trans. Software Eng..

[121]  Françoise Détienne,et al.  A Methodological Framework for Socio-Cognitive Analyses of Collaborative Design of Open Source Software , 2006, Computer Supported Cooperative Work (CSCW).