Sharing Data and Models in Software Engineering

Data Science for Software Engineering: Sharing Data and Models presents guidance and procedures for reusing data and models between projects to produce results that are useful and relevant. Starting with a background section of practical lessons and warnings for beginner data scientists for software engineering, this edited volume proceeds to identify critical questions of contemporary software engineering related to data and models. Learn how to adapt data from other organizations to local problems, mine privatized data, prune spurious information, simplify complex results, how to update models for new platforms, and more. Chapters share largely applicable experimental results discussed with the blend of practitioner focused domain expertise, with commentary that highlights the methods that are most useful, and applicable to the widest range of projects. Each chapter is written by a prominent expert and offers a state-of-the-art solution to an identified problem facing data scientists in software engineering. Throughout, the editors share best practices collected from their experience training software engineering students and practitioners to master data science, and highlight the methods that are most useful, and applicable to the widest range of projects. Shares the specific experience of leading researchers and techniques developed to handle data problems in the realm of software engineering Explains how to start a project of data science for software engineering as well as how to identify and avoid likely pitfalls Provides a wide range of useful qualitative and quantitative principles ranging from very simple to cutting edge research Addresses current challenges with software engineering data such as lack of local data, access issues due to data privacy, increasing data quality via cleaning of spurious chunks in data Table of Contents Introduction Data Science 101 Cross company data: Friend or Foe? Pruning: Relevancy is the Removal of Irrelevancy Easy Path: Smarter Design Instance Weighting: How not to elaborate on analogies Privacy: Data in Disguise Stability: How to find a silver-bullet model? Complexity: How to ensemble multiple models?

[1]  Ramesh Nallapati,et al.  A Comparative Study of Methods for Transductive Transfer Learning , 2007 .

[2]  Tim Menzies,et al.  Distributed development considered harmful? , 2013, 2013 35th International Conference on Software Engineering (ICSE).

[3]  Jacek Czerwonka,et al.  CRANE: Failure Prediction, Change Analysis and Test Prioritization in Practice -- Experiences from Windows , 2011, 2011 Fourth IEEE International Conference on Software Testing, Verification and Validation.

[4]  Xin Yao,et al.  DDD: A New Ensemble Approach for Dealing with Concept Drift , 2012, IEEE Transactions on Knowledge and Data Engineering.

[5]  Yücel Saygin,et al.  Suppressing microdata to prevent classification based inference , 2009, The VLDB Journal.

[6]  Marcus A. Maloof,et al.  Dynamic Weighted Majority: An Ensemble Method for Drifting Concepts , 2007, J. Mach. Learn. Res..

[7]  Rayid Ghani,et al.  Testing software in age of data privacy: a balancing act , 2011, ESEC/FSE '11.

[8]  Miguel Castro,et al.  Better bug reporting with better privacy , 2008, ASPLOS 2008.

[9]  Tim Menzies,et al.  Just enough learning (of association rules): the TAR2 “Treatment” learner , 2006, Artificial Intelligence Review.

[10]  Silvio Romero de Lemos Meira,et al.  Bagging Predictors for Estimation of Software Project Effort , 2007, 2007 International Joint Conference on Neural Networks.

[11]  Emilia Mendes,et al.  Bayesian Network Models for Web Effort Prediction: A Comparative Study , 2008, IEEE Transactions on Software Engineering.

[12]  Ayse Basar Bener,et al.  Ensemble of neural networks with associative memory (ENNA) for estimating software development costs , 2009, Knowl. Based Syst..

[13]  David Lo,et al.  kb-anonymity: a model for anonymized behaviour-preserving test and debugging data , 2011, PLDI '11.

[14]  Russell Reed,et al.  Pruning algorithms-a survey , 1993, IEEE Trans. Neural Networks.

[15]  Ye Yang,et al.  An investigation on the feasibility of cross-project defect prediction , 2012, Automated Software Engineering.

[16]  Elaine J. Weyuker,et al.  Programmer-based fault prediction , 2010, PROMISE '10.

[17]  Daniel Ryan Baker,et al.  A Hybrid Approach to Expert and Model Based Effort Estimation , 2007 .

[18]  José Javier Dolado,et al.  On the problem of the software cost function , 2001, Inf. Softw. Technol..

[19]  S. Arun Kumar,et al.  State of Software Metrics to Forecast Variety of Elements in the Software Development Process , 2011 .

[20]  Forrest Shull,et al.  How perspective-based reading can improve requirements inspections , 2000, Computer.

[21]  John E. Gaffney,et al.  Software Function, Source Lines of Code, and Development Effort Prediction: A Software Science Validation , 1983, IEEE Transactions on Software Engineering.

[22]  Ayse Basar Bener,et al.  Practical considerations in deploying statistical methods for defect prediction: A case study within the Turkish telecommunications industry , 2010, Inf. Softw. Technol..

[23]  Tim Menzies,et al.  Data Mining Static Code Attributes to Learn Defect Predictors , 2007, IEEE Transactions on Software Engineering.

[24]  Y. Miyazaki,et al.  Robust regression for developing software estimation models , 1994, J. Syst. Softw..

[25]  Carla E. Brodley,et al.  Active learning for biomedical citation screening , 2010, KDD.

[26]  Tim Menzies,et al.  Goldfish bowl panel: Software development analytics , 2012, 2012 34th International Conference on Software Engineering (ICSE).

[27]  Tibor Gyimóthy,et al.  Empirical validation of object-oriented metrics on open source software for fault prediction , 2005, IEEE Transactions on Software Engineering.

[28]  Raed Shatnawi,et al.  A Quantitative Investigation of the Acceptable Risk Levels of Object-Oriented Metrics in Open-Source Systems , 2010, IEEE Transactions on Software Engineering.

[29]  Ninghui Li,et al.  Injector: Mining Background Knowledge for Data Anonymization , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[30]  Ricardo Valerdi,et al.  Heuristics for Systems Engineering Cost Estimation , 2011, IEEE Systems Journal.

[31]  Victor R. Basili,et al.  A Validation of Object-Oriented Design Metrics as Quality Indicators , 1996, IEEE Trans. Software Eng..

[32]  Witold Pedrycz,et al.  Practical assessment of the models for identification of defect-prone classes in object-oriented commercial systems using design metrics , 2003, J. Syst. Softw..

[33]  Ayse Basar Bener,et al.  Regularities in Learning Defect Predictors , 2010, PROFES.

[34]  Vasilios Zorkadis,et al.  Efficient information theoretic strategies for classifier combination, feature extraction and performance evaluation in improving false positives and false negatives for spam e-mail filtering , 2005, Neural Networks.

[35]  H. B. Mann,et al.  On a Test of Whether one of Two Random Variables is Stochastically Larger than the Other , 1947 .

[36]  Roger C. Schank,et al.  Scripts, plans, goals and understanding: an inquiry into human knowledge structures , 1978 .

[37]  Günther Ruhe,et al.  Analysis of attribute weighting heuristics for analogy-based software effort estimation method AQUA+ , 2008, Empirical Software Engineering.

[38]  Xin Yao,et al.  journal homepage: www.elsevier.com/locate/infsof Ensembles and locality: Insight on improving software effort estimation , 2022 .

[39]  Giovanni Seni,et al.  Ensemble Methods in Data Mining: Improving Accuracy Through Combining Predictions , 2010, Ensemble Methods in Data Mining.

[40]  Lionel C. Briand,et al.  Predicting fault-prone components in a java legacy system , 2006, ISESE '06.

[41]  Witold Pedrycz,et al.  Identification of defect-prone classes in telecommunication software systems using design metrics , 2006, Inf. Sci..

[42]  Barry W. Boehm,et al.  Can we build software faster and better and cheaper? , 2009, PROMISE '09.

[43]  B. John Oommen,et al.  A brief taxonomy and ranking of creative prototype reduction schemes , 2003, Pattern Analysis & Applications.

[44]  Steven R. Rakitin,et al.  Software verification and validation for practitioners and managers , 2001 .

[45]  Emilia Mendes,et al.  Applying moving windows to software effort estimation , 2009, ESEM 2009.

[46]  Sargur N. Srihari,et al.  Decision Combination in Multiple Classifier Systems , 1994, IEEE Trans. Pattern Anal. Mach. Intell..

[47]  Michelle Cartwright,et al.  An Empirical Investigation of an Object-Oriented Software System , 2000, IEEE Trans. Software Eng..

[48]  Chidchanok Lursinsap,et al.  Estimating Software Effort with Minimum Features Using Neural Functional Approximation , 2010, 2010 International Conference on Computational Science and Its Applications.

[49]  Arvinder Kaur,et al.  Empirical analysis for investigating the effect of object-oriented metrics on fault proneness: a replicated case study , 2009 .

[50]  Forrest Shull,et al.  Local versus Global Lessons for Defect Prediction and Effort Estimation , 2013, IEEE Transactions on Software Engineering.

[51]  Charles Elkan,et al.  Scalability for clustering algorithms revisited , 2000, SKDD.

[52]  Jiri Matas,et al.  On Combining Classifiers , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[53]  Ayse Basar Bener,et al.  A new perspective on data homogeneity in software cost estimation: a study in the embedded systems domain , 2010, Software Quality Journal.

[54]  L. Breiman Heuristics of instability and stabilization in model selection , 1996 .

[55]  Ayse Basar Bener,et al.  On the relative value of cross-company and within-company data for defect prediction , 2009, Empirical Software Engineering.

[56]  Tim Menzies,et al.  Finding conclusion stability for selecting the best effort predictor in software effort estimation , 2012, Automated Software Engineering.

[57]  Barry W. Boehm,et al.  Software development cost estimation approaches — A survey , 2000, Ann. Softw. Eng..

[58]  A. E. Eiben,et al.  Introduction to Evolutionary Computing , 2003, Natural Computing Series.

[59]  Emilia Mendes,et al.  Investigating the Use of Chronological Splitting to Compare Software Cross-company and Single-company Effort Predictions: A Replicated Study , 2009, EASE.

[60]  Kyuseok Shim,et al.  Approximate algorithms with generalizing attribute values for k-anonymity , 2010, Inf. Syst..

[61]  Chen Fu,et al.  Is Data Privacy Always Good for Software Testing? , 2010, 2010 IEEE 21st International Symposium on Software Reliability Engineering.

[62]  Xin Yao,et al.  Can cross-company data improve performance in software effort estimation? , 2012, PROMISE '12.

[63]  Anas N. Al-Rabadi,et al.  A comparison of modified reconstructability analysis and Ashenhurst‐Curtis decomposition of Boolean functions , 2004 .

[64]  Andres S. Orrego,et al.  SAWTOOTH: Learning from Huge Amounts of Data , 2004 .

[65]  Xin Yao,et al.  How to make best use of cross-company data in software effort estimation? , 2014, ICSE.

[66]  Philip S. Yu,et al.  Privacy-preserving data publishing: A survey of recent developments , 2010, CSUR.

[67]  Taghi M. Khoshgoftaar,et al.  Software quality analysis by combining multiple projects and learners , 2008, Software Quality Journal.

[68]  Barry W. Boehm,et al.  How Much Software Quality Investment Is Enough: A Value-Based Approach , 2006, IEEE Software.

[69]  Tim Menzies,et al.  Software Analytics: So What? , 2013, IEEE Softw..

[70]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[71]  Tim Menzies,et al.  Active learning and effort estimation: Finding the essential content of software effort estimation data , 2013, IEEE Transactions on Software Engineering.

[72]  Josep Domingo-Ferrer,et al.  Hybrid microdata using microaggregation , 2010, Inf. Sci..

[73]  Seok-Beom Roh,et al.  Design of fuzzy radial basis function-based polynomial neural networks , 2011, Fuzzy Sets Syst..

[74]  Khaled El Emam,et al.  The Confounding Effect of Class Size on the Validity of Object-Oriented Metrics , 2001, IEEE Trans. Software Eng..

[75]  Xin Yao,et al.  An analysis of multi-objective evolutionary algorithms for training ensemble models based on different performance measures in software effort estimation , 2013, PROMISE.

[76]  Filomena Ferrucci,et al.  How Multi-Objective Genetic Programming Is Effective for Software Development Effort Estimation? , 2011, SSBSE.

[77]  David J. Hand,et al.  Protection or Privacy? Data Mining and Personal Data , 2006, PAKDD.

[78]  Koichiro Ochimizu,et al.  Towards logistic regression models for predicting fault-prone code across software projects , 2009, ESEM 2009.

[79]  Tim Menzies,et al.  Scalable product line configuration: A straw to break the camel's back , 2013, 2013 28th IEEE/ACM International Conference on Automated Software Engineering (ASE).

[80]  Ron Kohavi,et al.  A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection , 1995, IJCAI.

[81]  Magne Jørgensen,et al.  Contrasting ideal and realistic conditions as a means to improve judgment-based software development effort estimation , 2011, Inf. Softw. Technol..

[82]  Pornsiri Muenchaisri,et al.  Predicting Faulty Classes Using Design Metrics with Discriminant Analysis , 2003, Software Engineering Research and Practice.

[83]  Barry W. Boehm,et al.  Understanding and Controlling Software Costs , 1988, IEEE Trans. Software Eng..

[84]  Barbara A. Kitchenham,et al.  A Simulation Study of the Model Evaluation Criterion MMRE , 2003, IEEE Trans. Software Eng..

[85]  Ron Kohavi,et al.  Wrappers for Feature Subset Selection , 1997, Artif. Intell..

[86]  Andrew McCallum,et al.  Efficient clustering of high-dimensional data sets with application to reference matching , 2000, KDD '00.

[87]  Bernhard Schölkopf,et al.  Correcting Sample Selection Bias by Unlabeled Data , 2006, NIPS.

[88]  John A. Clark,et al.  Metrics are fitness functions too , 2004 .

[89]  Tim Menzies,et al.  Balancing Privacy and Utility in Cross-Company Defect Prediction , 2013, IEEE Transactions on Software Engineering.

[90]  José Javier Dolado,et al.  A Validation of the Component-Based Method for Software Size Estimation , 2000, IEEE Trans. Software Eng..

[91]  Daphne Koller,et al.  Learning a meta-level prior for feature relevance from multiple related tasks , 2007, ICML '07.

[92]  Martin S. Feather,et al.  Model-checking for validation of a fault protection system , 2001, Proceedings Sixth IEEE International Symposium on High Assurance Systems Engineering. Special Topic: Impact of Networking.

[93]  Tim Menzies,et al.  Simple software cost analysis: safe or unsafe? , 2005, PROMISE '05.

[94]  Ian Witten,et al.  Data Mining , 2000 .

[95]  Tim Menzies,et al.  Model-based tests of truisms , 2002, Proceedings 17th IEEE International Conference on Automated Software Engineering,.

[96]  K Atkinson,et al.  The use of function points to find cost analogies , 1994 .

[97]  Lech Madeyski,et al.  Towards identifying software project clusters with regard to defect prediction , 2010, PROMISE '10.

[98]  Richard Stutzke Estimating Software-Intensive Systems: Projects, Products, and Processes (Sei Series in Software Engineering) , 2005 .

[99]  Cynthia Dwork,et al.  Differential Privacy , 2006, ICALP.

[100]  Kevin N. Gurney,et al.  An introduction to neural networks , 2018 .

[101]  Marcel Korte,et al.  Confidence in software cost estimation results based on MMRE and PRED , 2008, PROMISE '08.

[102]  Greg Hamerly,et al.  Making k-means Even Faster , 2010, SDM.

[103]  Bojan Cukic,et al.  Building a second opinion: learning cross-company data , 2013, PROMISE.

[104]  Ashwin Machanavajjhala,et al.  Worst-Case Background Knowledge in Privacy , 2006 .

[105]  Paolo Nesi,et al.  A study on fault-proneness detection of object-oriented systems , 2001, Proceedings Fifth European Conference on Software Maintenance and Reengineering.

[106]  Magne Jørgensen,et al.  The Impact of Lessons-Learned Sessions on Effort Estimation and Uncertainty Assessments , 2009, IEEE Transactions on Software Engineering.

[107]  Zhihao Chen,et al.  Validation methods for calibrating software effort models , 2005, Proceedings. 27th International Conference on Software Engineering, 2005. ICSE 2005..

[108]  Michael Fagan Design and Code Inspections to Reduce Errors in Program Development , 1976, IBM Syst. J..

[109]  Burak Turhan,et al.  On the dataset shift problem in software engineering prediction models , 2011, Empirical Software Engineering.

[110]  Daniela E. Damian,et al.  Selecting Empirical Methods for Software Engineering Research , 2008, Guide to Advanced Empirical Software Engineering.

[111]  Hamid Habibagahi,et al.  Estimating Software Productivity and Cost for NASA Projects , 1991 .

[112]  Tom DeMarco,et al.  Peopleware: Productive Projects and Teams , 1987 .

[113]  Tim Menzies,et al.  Feature subset selection can improve software cost estimation accuracy , 2005, ACM SIGSOFT Softw. Eng. Notes.

[114]  Abram Hindle Green mining: A methodology of relating software change to power consumption , 2012, 2012 9th IEEE Working Conference on Mining Software Repositories (MSR).

[115]  Yuming Zhou,et al.  Empirical Analysis of Object-Oriented Design Metrics for Predicting High and Low Severity Faults , 2006, IEEE Transactions on Software Engineering.

[116]  Yann-Gaël Guéhéneuc,et al.  Recommendation system for design patterns in software development: An DPR overview , 2012, 2012 Third International Workshop on Recommendation Systems for Software Engineering (RSSE).

[117]  Ninghui Li,et al.  Modeling and Integrating Background Knowledge in Data Anonymization , 2009, 2009 IEEE 25th International Conference on Data Engineering.

[118]  Jesús M. González-Barahona,et al.  Comparison between SLOCs and number of files as size metrics for software evolution analysis , 2006, Conference on Software Maintenance and Reengineering (CSMR'06).

[119]  Kurt Hornik,et al.  Approximation capabilities of multilayer feedforward networks , 1991, Neural Networks.

[120]  Gavin R. Finnie,et al.  Estimating software development effort with connectionist models , 1997, Inf. Softw. Technol..

[121]  Irit Dinur,et al.  Revealing information while preserving privacy , 2003, PODS.

[122]  Tore Dybå,et al.  A systematic review of effect size in software engineering experiments , 2007, Inf. Softw. Technol..

[123]  David G. Stork,et al.  Pattern Classification , 1973 .

[124]  Thomas J. Ostrand,et al.  \{PROMISE\} Repository of empirical software engineering data , 2007 .

[125]  Kalyanmoy Deb,et al.  A fast and elitist multiobjective genetic algorithm: NSGA-II , 2002, IEEE Trans. Evol. Comput..

[126]  Latanya Sweeney,et al.  Achieving k-Anonymity Privacy Protection Using Generalization and Suppression , 2002, Int. J. Uncertain. Fuzziness Knowl. Based Syst..

[127]  Jenq-Neng Hwang,et al.  Handbook of Neural Network Signal Processing , 2000, IEEE Transactions on Neural Networks.

[128]  Tim Menzies,et al.  On the Value of Ensemble Effort Estimation , 2012, IEEE Transactions on Software Engineering.

[129]  David S. Christensen,et al.  Calibrating Software Cost Models to Department of Defense DatabasesA Review of Ten Studies , 1998 .

[130]  Ahmed E. Hassan,et al.  Think locally, act globally: Improving defect and effort prediction models , 2012, 2012 9th IEEE Working Conference on Mining Software Repositories (MSR).

[131]  Lefteris Angelis,et al.  Using Ensembles for Web Effort Estimation , 2013, 2013 ACM / IEEE International Symposium on Empirical Software Engineering and Measurement.

[132]  L. Breiman,et al.  Submodel selection and evaluation in regression. The X-random case , 1992 .

[133]  Guilherme Horta Travassos,et al.  Cross versus Within-Company Cost Estimation Studies: A Systematic Review , 2007, IEEE Transactions on Software Engineering.

[134]  Maurice V. Wilkes,et al.  Memoirs of a Computer Pioneer , 1985 .

[135]  Barry Boehm,et al.  From Multiple Regression to Bayesian Analysis for Calibrating COCOMO II , 1999 .

[136]  Ramanath Subramanyam,et al.  Empirical Analysis of CK Metrics for Object-Oriented Design Complexity: Implications for Software Defects , 2003, IEEE Trans. Software Eng..

[137]  Xin Yao,et al.  Ensemble Learning Using Multi-Objective Evolutionary Algorithms , 2006, J. Math. Model. Algorithms.

[138]  Danny Ho,et al.  An Empirical Validation of Object-Oriented Design Metrics for Fault Prediction , 2008 .

[139]  Ayse Basar Bener,et al.  A comparative study for estimating software development effort intervals , 2011, Software Quality Journal.

[140]  Mei-Hwa Chen,et al.  An empirical study on object-oriented metrics , 1999, Proceedings Sixth International Software Metrics Symposium (Cat. No.PR00403).

[141]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[142]  Elaine J. Weyuker,et al.  Do too many cooks spoil the broth? Using the number of developers to enhance defect prediction models , 2008, Empirical Software Engineering.

[143]  Magne Jørgensen,et al.  How large are software cost overruns? A review of the 1994 CHAOS report , 2006, Inf. Softw. Technol..

[144]  James C. Bezdek,et al.  Some Notes on Twenty One (21) Nearest Prototype Classifiers , 2000, SSPR/SPR.

[145]  Xin Yao,et al.  An analysis of diversity measures , 2006, Machine Learning.

[146]  Martin Shepperd,et al.  Case and Feature Subset Selection in Case-Based Software Project Effort Prediction , 2003 .

[147]  Alberto Bacchelli,et al.  Content classification of development emails , 2012, 2012 34th International Conference on Software Engineering (ICSE).

[148]  Barry W. Boehm,et al.  How to avoid drastic software process change (using stochastic stability) , 2009, 2009 IEEE 31st International Conference on Software Engineering.

[149]  Tim Menzies,et al.  Transfer learning in effort estimation , 2015, Empirical Software Engineering.

[150]  A. Bifet,et al.  Early Drift Detection Method , 2005 .

[151]  Tracy Hall,et al.  A Systematic Literature Review on Fault Prediction Performance in Software Engineering , 2012, IEEE Transactions on Software Engineering.

[152]  Karen T. Lum,et al.  Selecting Best Practices for Effort Estimation , 2006, IEEE Transactions on Software Engineering.

[153]  Magne Jørgensen,et al.  A Systematic Review of Software Development Cost Estimation Studies , 2007 .

[154]  Lionel C. Briand,et al.  Assessing the Applicability of Fault-Proneness Models Across Object-Oriented Software Projects , 2002, IEEE Trans. Software Eng..

[155]  Emilia Mendes,et al.  Investigating the use of chronological splitting to compare software cross-company and single-company effort predictions , 2008 .

[156]  Bart Selman,et al.  Local search strategies for satisfiability testing , 1993, Cliques, Coloring, and Satisfiability.

[157]  José Demisio Simões da Silva,et al.  Comparison of Artificial Neural Network and Regression Models in Software Effort Estimation , 2007, 2007 International Joint Conference on Neural Networks.

[158]  Sanjoy Dasgupta,et al.  Analysis of a greedy active learning strategy , 2004, NIPS.

[159]  Martin Lukasiewycz,et al.  Opt4J: a modular framework for meta-heuristic optimization , 2011, GECCO '11.

[160]  Abbas Heiat,et al.  Comparison of artificial neural network and regression models for estimating software development effort , 2002, Inf. Softw. Technol..

[161]  Wenliang Du,et al.  Privacy-MaxEnt: integrating background knowledge in privacy quantification , 2008, SIGMOD Conference.

[162]  João Gama,et al.  Discretization from data streams: applications to histograms and data mining , 2006, SAC.

[163]  Emilia Mendes,et al.  A Comparative Study of Cost Estimation Models for Web Hypermedia Applications , 2003, Empirical Software Engineering.

[164]  Christos Faloutsos,et al.  FastMap: a fast algorithm for indexing, data-mining and visualization of traditional and multimedia datasets , 1995, SIGMOD '95.

[165]  Yasuhiro Mashiko,et al.  Using the GQM Paradigm to Investigate Influential Factors for Software Process Improvement , 1997, J. Syst. Softw..

[166]  James M. Rehg,et al.  Active learning for automatic classification of software behavior , 2004, ISSTA '04.

[167]  Letha H. Etzkorn,et al.  Empirical Validation of Three Software Metrics Suites to Predict Fault-Proneness of Object-Oriented Classes Developed Using Highly Iterative or Agile Software Development Processes , 2007, IEEE Transactions on Software Engineering.

[168]  Ron Kohavi,et al.  Supervised and Unsupervised Discretization of Continuous Features , 1995, ICML.

[169]  Chin-Liang Chang,et al.  Finding Prototypes For Nearest Neighbor Classifiers , 1974, IEEE Transactions on Computers.

[170]  Chris Verhoef,et al.  The rise and fall of the Chaos report figures , 2010, IEEE Software.

[171]  T. Wright,et al.  Organizational Benchmarking Using the ISBSG Data Repository , 2001, IEEE Softw..

[172]  David W. Aha,et al.  Instance-Based Learning Algorithms , 1991, Machine Learning.

[173]  Ioannis Stamelos,et al.  Software productivity and effort prediction with ordinal regression , 2005, Inf. Softw. Technol..

[174]  Philip J. Guo,et al.  Characterizing and predicting which bugs get fixed: an empirical study of Microsoft Windows , 2010, 2010 ACM/IEEE 32nd International Conference on Software Engineering.

[175]  Janet L. Kolodner,et al.  Improving Human Decision Making through Case-Based Decision Aiding , 1991, AI Mag..

[176]  Yuanyuan Zhang,et al.  App store mining and analysis: MSR for app stores , 2012, 2012 9th IEEE Working Conference on Mining Software Repositories (MSR).

[177]  Vitaly Shmatikov,et al.  The cost of privacy: destruction of data-mining utility in anonymized data publishing , 2008, KDD.

[178]  Jane Huffman Hayes,et al.  Advancing candidate link generation for requirements tracing: the study of methods , 2006, IEEE Transactions on Software Engineering.

[179]  Tim Menzies,et al.  Data Mining Static Code Attributes to Learn Defect Predictors , 2007 .

[180]  Tim Menzies,et al.  On the value of user preferences in search-based software engineering: A case study in software product lines , 2013, 2013 35th International Conference on Software Engineering (ICSE).

[181]  Naonori Ueda,et al.  Generalization error of ensemble estimators , 1996, Proceedings of International Conference on Neural Networks (ICNN'96).

[182]  Harald C. Gall,et al.  Cross-project defect prediction: a large scale experiment on data vs. domain vs. process , 2009, ESEC/SIGSOFT FSE.

[183]  Xin Yao,et al.  Simultaneous training of negatively correlated neural networks in an ensemble , 1999, IEEE Trans. Syst. Man Cybern. Part B.

[184]  Michael English,et al.  Fault detection and prediction in an open-source software project , 2009, PROMISE '09.

[185]  Andreas Zeller,et al.  Learning from 6,000 projects: lightweight cross-project anomaly detection , 2010, ISSTA '10.

[186]  Kamal A. Ali,et al.  On the Link between Error Correlation and Error Reduction in Decision Tree Ensembles , 1995 .

[187]  Thong Ngee Goh,et al.  A study of project selection and feature weighting for analogy based software cost estimation , 2009, J. Syst. Softw..

[188]  Xin Yao,et al.  Ensemble learning via negative correlation , 1999, Neural Networks.

[189]  Mark Harman,et al.  The relationship between search based software engineering and predictive modeling , 2010, PROMISE '10.

[190]  Tim Menzies,et al.  Converging on the optimal attainment of requirements , 2002, Proceedings IEEE Joint International Conference on Requirements Engineering.

[191]  Forrest Shull,et al.  Investigating Reading Techniques for Object-Oriented Framework Learning , 2000, IEEE Trans. Software Eng..

[192]  Ayse Basar Bener,et al.  Practical considerations in deploying AI for defect prediction: a case study within the Turkish telecommunication industry , 2009, PROMISE '09.

[193]  Bojan Cukic,et al.  An adaptive approach with active learning in software fault prediction , 2012, PROMISE '12.

[194]  Jacky W. Keung Theoretical Maximum Prediction Accuracy for Analogy-Based Software Cost Estimation , 2008, 2008 15th Asia-Pacific Software Engineering Conference.

[195]  Federico Girosi,et al.  Training support vector machines: an application to face detection , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[196]  Barry Boehm,et al.  Software quality analysis: a value-based approach , 2006 .

[197]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[198]  S. Kotsiantis,et al.  Discretization Techniques: A recent survey , 2006 .

[199]  Barry Boehm,et al.  Calibration Approach and Results of the COCOMO II Post- Architecture Model , 1998 .

[200]  Barry W. Boehm,et al.  Bayesian Analysis of Empirical Software Engineering Cost Models , 1999, IEEE Trans. Software Eng..

[201]  Geoff Holmes,et al.  Benchmarking Attribute Selection Techniques for Discrete Class Data Mining , 2003, IEEE Trans. Knowl. Data Eng..

[202]  Ninghui Li,et al.  t-Closeness: Privacy Beyond k-Anonymity and l-Diversity , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[203]  Christian Bird,et al.  The inductive software engineering manifesto: principles for industrial data mining , 2011, MALETS '11.

[204]  Burak Turhan,et al.  Learning Better Inspection Optimization Policies , 2012, Int. J. Softw. Eng. Knowl. Eng..

[205]  David E. Goldberg,et al.  Genetic Algorithms, Tournament Selection, and the Effects of Noise , 1995, Complex Syst..

[206]  Sadaaki Miyamoto,et al.  On the Comparison of Some Fuzzy Clustering Methods for Privacy Preserving Data Mining: Towards the Development of Specific Information Loss Measures , 2009, Kybernetika.

[207]  Raghu Ramakrishnan,et al.  Privacy Skyline: Privacy with Multidimensional Adversarial Knowledge , 2007, VLDB.

[208]  Michael E. Fagan Advances in software inspections , 1986, IEEE Transactions on Software Engineering.

[209]  C. Kaynak,et al.  Techniques for Combining Multiple Learners , 1998 .

[210]  Shyue-Liang Wang,et al.  Hiding informative association rule sets , 2007, Expert Syst. Appl..

[211]  Huanhuan Chen,et al.  Regularized Negative Correlation Learning for Neural Network Ensembles , 2009, IEEE Transactions on Neural Networks.

[212]  Pat Langley,et al.  Estimating Continuous Distributions in Bayesian Classifiers , 1995, UAI.

[213]  Y. Zhao,et al.  Comparison of decision tree methods for finding active objects , 2007, 0708.4274.

[214]  Ayse Basar Bener,et al.  Exploiting the Essential Assumptions of Analogy-Based Effort Estimation , 2012, IEEE Transactions on Software Engineering.

[215]  Harry Zhang,et al.  Learning weighted naive Bayes with accurate ranking , 2004, Fourth IEEE International Conference on Data Mining (ICDM'04).

[216]  Peter Tiño,et al.  Managing Diversity in Regression Ensembles , 2005, J. Mach. Learn. Res..

[217]  Xin Zhang,et al.  Adaptive Email Spam Filtering Based on Information Theory , 2007, WISE.

[218]  Tim Menzies,et al.  Sharing experiments using open‐source software , 2011, Softw. Pract. Exp..

[219]  Horst Bunke,et al.  Feature selection algorithms for the generation of multiple classifier systems and their application to handwritten word recognition , 2004 .

[220]  Emilia Mendes,et al.  Investigating the use of chronological split for software effort estimation , 2009, IET Softw..

[221]  D. Sculley,et al.  Web-scale k-means clustering , 2010, WWW '10.

[222]  Stephen G. MacDonell,et al.  What accuracy statistics really measure , 2001, IEE Proc. Softw..

[223]  Chris Clifton,et al.  Privacy-preserving data mining: why, how, and when , 2004, IEEE Security & Privacy Magazine.

[224]  Emilia Mendes,et al.  Web effort estimation: the value of cross-company data set compared to single-company data set , 2012, PROMISE '12.

[225]  Carolyn Mair,et al.  The consistency of empirical comparisons of regression and analogy-based software project cost prediction , 2005, 2005 International Symposium on Empirical Software Engineering, 2005..

[226]  Tim Menzies,et al.  Automatically finding the control variables for complex system behavior , 2010, Automated Software Engineering.

[227]  Filomena Ferrucci,et al.  Single and Multi Objective Genetic Programming for software development effort estimation , 2012, SAC '12.

[228]  Ayse Bener,et al.  Evaluation of Feature Extraction Methods on Software Cost Estimation , 2007, ESEM 2007.

[229]  L. Hedges,et al.  The Handbook of Research Synthesis , 1995 .

[230]  Xin Yao,et al.  Diversity creation methods: a survey and categorisation , 2004, Inf. Fusion.

[231]  Dongmei Zhang,et al.  Performance debugging in the large via mining millions of stack traces , 2012, 2012 34th International Conference on Software Engineering (ICSE).

[232]  Eric Bauer,et al.  An Empirical Comparison of Voting Classification Algorithms: Bagging, Boosting, and Variants , 1999, Machine Learning.

[233]  Padhraic Smyth,et al.  From Data Mining to Knowledge Discovery in Databases , 1996, AI Mag..

[234]  Jacky W. Keung,et al.  Empirical evaluation of analogy-x for software cost estimation , 2008, ESEM '08.

[235]  Robert Tibshirani,et al.  The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd Edition , 2001, Springer Series in Statistics.

[236]  Christopher M. Bishop,et al.  Neural networks for pattern recognition , 1995 .

[237]  Žliobait . e,et al.  Learning under Concept Drift: an Overview , 2010 .

[238]  Sherry Stukes,et al.  Software Cost Model Calibration , 1998 .

[239]  Lionel C. Briand,et al.  Replicated Case Studies for Investigating Quality Factors in Object-Oriented Designs , 2001, Empirical Software Engineering.

[240]  Oussama El-Rawas,et al.  A second look at Faster, Better, Cheaper , 2010, Innovations in Systems and Software Engineering.

[241]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1997, EuroCOLT.

[242]  Cynthia Dwork,et al.  Differential Privacy: A Survey of Results , 2008, TAMC.

[243]  Latanya Sweeney,et al.  k-Anonymity: A Model for Protecting Privacy , 2002, Int. J. Uncertain. Fuzziness Knowl. Based Syst..

[244]  Andrew Begel,et al.  Analyze this! 145 questions for data scientists in software engineering , 2013, ICSE.

[245]  Hongfang Liu,et al.  Theory of relative defect proneness , 2008, Empirical Software Engineering.

[246]  Janez Demsar,et al.  Statistical Comparisons of Classifiers over Multiple Data Sets , 2006, J. Mach. Learn. Res..

[247]  Marcus A. Maloof,et al.  Using additive expert ensembles to cope with concept drift , 2005, ICML.

[248]  Yue Jiang,et al.  Techniques for evaluating fault prediction models , 2008, Empirical Software Engineering.

[249]  Martin J. Shepperd,et al.  Estimating Software Project Effort Using Analogies , 1997, IEEE Trans. Software Eng..

[250]  Wei-Yin Loh,et al.  Classification and regression trees , 2011, WIREs Data Mining Knowl. Discov..

[251]  Tzung-Pei Hong,et al.  Efficient sanitization of informative association rules , 2008, Expert Syst. Appl..

[252]  Shari Lawrence Pfleeger,et al.  An empirical study of maintenance and development estimation accuracy , 2002, J. Syst. Softw..

[253]  José Francisco Martínez Trinidad,et al.  A review of instance selection methods , 2010, Artificial Intelligence Review.

[254]  Fayola Peters,et al.  CLIFF: Finding Prototypes for Nearest Neighbor Algorithms with Application to Forensic Trace Evidence , 2010 .

[255]  Premkumar T. Devanbu,et al.  Ecological inference in empirical software engineering , 2011, 2011 26th IEEE/ACM International Conference on Automated Software Engineering (ASE 2011).

[256]  Garima Verma,et al.  Software Defects and Object Oriented Metrics - An Empirical Analysis , 2010 .

[257]  Michael W. Godfrey,et al.  "Cloning Considered Harmful" Considered Harmful , 2006, 2006 13th Working Conference on Reverse Engineering.

[258]  H. E. Dunsmore,et al.  Software engineering metrics and models , 1986 .

[259]  Thong Ngee Goh,et al.  A study of mutual information based feature selection for case based reasoning in software cost estimation , 2009, Expert Syst. Appl..

[260]  Roland Kuhn,et al.  Discriminative Instance Weighting for Domain Adaptation in Statistical Machine Translation , 2010, EMNLP.

[261]  Uri Lipowezky Selection of the optimal prototype subset for 1-NN classification , 1998, Pattern Recognit. Lett..

[262]  Tim Menzies,et al.  Data Mining for Very Busy People , 2003, Computer.

[263]  Barry W. Boehm,et al.  An analysis of trends in productivity and cost drivers over years , 2011, Promise '11.

[264]  Doo-Hwan Bae,et al.  An empirical analysis of software effort estimation with outlier elimination , 2008, PROMISE '08.

[265]  Xin Yao,et al.  Performance Scaling of Multi-objective Evolutionary Algorithms , 2003, EMO.

[266]  Laurie A. Williams,et al.  Evaluating Complexity, Code Churn, and Developer Activity Metrics as Indicators of Software Vulnerabilities , 2011, IEEE Transactions on Software Engineering.

[267]  Pierangela Samarati,et al.  Protecting privacy when disclosing information: k-anonymity and its enforcement through generalization and suppression , 1998 .

[268]  Amos Storkey,et al.  When Training and Test Sets are Different: Characterising Learning Transfer , 2013 .

[269]  John A. Hartigan,et al.  Clustering Algorithms , 1975 .

[270]  Elisa Bertino,et al.  State-of-the-art in privacy preserving data mining , 2004, SGMD.

[271]  Jacob Cohen,et al.  A power primer. , 1992, Psychological bulletin.

[272]  Javam C. Machado,et al.  The prediction of faulty classes using object-oriented design metrics , 2001, J. Syst. Softw..

[273]  Tim Menzies,et al.  Size doesn't matter?: on the value of software size features for effort estimation , 2012, PROMISE '12.

[274]  Geoffrey I. Webb,et al.  Supervised Descriptive Rule Discovery: A Unifying Survey of Contrast Set, Emerging Pattern and Subgroup Mining , 2009, J. Mach. Learn. Res..

[275]  Mohammad Azzeh,et al.  Software effort estimation based on optimized model tree , 2011, Promise '11.

[276]  Thong Ngee Goh,et al.  A study of the non-linear adjustment for analogy based software cost estimation , 2009, Empirical Software Engineering.

[277]  Guangchun Luo,et al.  Transfer learning for cross-company software defect prediction , 2012, Inf. Softw. Technol..

[278]  Hausi A. Müller,et al.  Predicting fault-proneness using OO metrics. An industrial case study , 2002, Proceedings of the Sixth European Conference on Software Maintenance and Reengineering.

[279]  Kun Liu,et al.  On the Privacy of Euclidean Distance Preserving Data Perturbation , 2009, ArXiv.

[280]  Zhi-Hua Zhou,et al.  Sample-based software defect prediction with active and semi-supervised learning , 2012, Automated Software Engineering.

[281]  Barbara A. Kitchenham,et al.  Effort estimation using analogy , 1996, Proceedings of IEEE 18th International Conference on Software Engineering.

[282]  David D. Lewis,et al.  Naive (Bayes) at Forty: The Independence Assumption in Information Retrieval , 1998, ECML.

[283]  F. Wilcoxon Individual Comparisons by Ranking Methods , 1945 .

[284]  Sinno Jialin Pan,et al.  Transfer defect learning , 2013, 2013 35th International Conference on Software Engineering (ICSE).

[285]  C. Chabris,et al.  Gorillas in Our Midst: Sustained Inattentional Blindness for Dynamic Events , 1999, Perception.

[286]  Barry Boehm,et al.  Bayesian analysis of software cost and quality models , 1999 .

[287]  Magne Jørgensen,et al.  A review of studies on expert estimation of software development effort , 2004, J. Syst. Softw..

[288]  N. Nagappan,et al.  Static analysis tools as early indicators of pre-release defect density , 2005, Proceedings. 27th International Conference on Software Engineering, 2005. ICSE 2005..

[289]  S. Muthukrishnan,et al.  Data streams: algorithms and applications , 2005, SODA '03.

[290]  Gregorio Robles,et al.  Replicating MSR: A study of the potential replicability of papers published in the Mining Software Repositories proceedings , 2010, 2010 7th IEEE Working Conference on Mining Software Repositories (MSR 2010).

[291]  Dino Pedreschi,et al.  Anonymity preserving pattern discovery , 2008, The VLDB Journal.

[292]  Barry W. Boehm,et al.  Finding the right data for software cost modeling , 2005, IEEE Software.

[293]  Lloyd A. Smith,et al.  Practical feature subset selection for machine learning , 1998 .

[294]  A. Vargha,et al.  A Critique and Improvement of the CL Common Language Effect Size Statistics of McGraw and Wong , 2000 .

[295]  Ayse Basar Bener,et al.  ENNA: software effort estimation using ensemble of neural networks with associative memory , 2008, SIGSOFT '08/FSE-16.

[296]  Tim Menzies,et al.  How to Find Relevant Data for Effort Estimation? , 2011, 2011 International Symposium on Empirical Software Engineering and Measurement.

[297]  Taghi M. Khoshgoftaar,et al.  Evolutionary Optimization of Software Quality Modeling with Multiple Repositories , 2010, IEEE Transactions on Software Engineering.

[298]  Thomas G. Dietterich,et al.  Improving SVM accuracy by training on auxiliary data sources , 2004, ICML.

[299]  Gerald Reif,et al.  Supporting developers with natural language queries , 2010, 2010 ACM/IEEE 32nd International Conference on Software Engineering.

[300]  Sudha Ram,et al.  Constrained cascade generalization of decision trees , 2004, IEEE Transactions on Knowledge and Data Engineering.

[301]  Douglas Fisher,et al.  Machine Learning Approaches to Estimating Software Development Effort , 1995, IEEE Trans. Software Eng..

[302]  Tim Menzies,et al.  Privacy and utility for defect prediction: Experiments with MORPH , 2012, 2012 34th International Conference on Software Engineering (ICSE).

[303]  Andrew Chin,et al.  Differential Privacy as a Response to the Reidentification Threat: The Facebook Advertiser Case Study , 2012 .

[304]  Tim Menzies,et al.  Optimizing requirements decisions with keys , 2008, PROMISE '08.

[305]  Qiang Yang,et al.  Transferring Naive Bayes Classifiers for Text Classification , 2007, AAAI.

[306]  Jeffrey C. Carver,et al.  Knowledge-Sharing Issues in Experimental Software Engineering , 2004, Empirical Software Engineering.

[307]  Elaine J. Weyuker,et al.  Where the bugs are , 2004, ISSTA '04.

[308]  Ethem Alpaydin,et al.  Introduction to machine learning , 2004, Adaptive computation and machine learning.

[309]  G DietterichThomas An Experimental Comparison of Three Methods for Constructing Ensembles of Decision Trees , 2000 .

[310]  Kjetil Moløkken-Østvold,et al.  A review of software surveys on software effort estimation , 2003, 2003 International Symposium on Empirical Software Engineering, 2003. ISESE 2003. Proceedings..

[311]  Raed Shatnawi,et al.  The effectiveness of software metrics in identifying error-prone classes in post-release software evolution process , 2008, J. Syst. Softw..

[312]  Yue Jiang,et al.  Cost Curve Evaluation of Fault Prediction Models , 2008, 2008 19th International Symposium on Software Reliability Engineering (ISSRE).

[313]  Chris F. Kemerer,et al.  An empirical validation of software cost estimation models , 1987, CACM.

[314]  James C. Bezdek,et al.  Nearest prototype classifier designs: An experimental study , 2001, Int. J. Intell. Syst..

[315]  B. Baskeles,et al.  Software effort estimation using machine learning methods , 2007, 2007 22nd international symposium on computer and information sciences.

[316]  Qiang Yang,et al.  A Survey on Transfer Learning , 2010, IEEE Transactions on Knowledge and Data Engineering.

[317]  Lionel C. Briand,et al.  Exploring the relationships between design measures and software quality in object-oriented systems , 2000, J. Syst. Softw..

[318]  Emilia Mendes,et al.  Using Chronological Splitting to Compare Cross- and Single-company Effort Models: Further Investigation , 2009, ACSC.

[319]  Vasilios Zorkadis,et al.  Efficient information theoretic extraction of higher order features for improving neural network-based spam e-mail categorization , 2006, J. Exp. Theor. Artif. Intell..

[320]  Jiawei Han,et al.  Knowledge transfer via multiple model local structure mapping , 2008, KDD.

[321]  Bart Baesens,et al.  Benchmarking Classification Models for Software Defect Prediction: A Proposed Framework and Novel Findings , 2008, IEEE Transactions on Software Engineering.

[322]  John Platt,et al.  FastMap, MetricMap, and Landmark MDS are all Nystrom Algorithms , 2005, AISTATS.

[323]  I. M. Alsmadi,et al.  Selecting a standard set of attributes for cost estimation of software projects , 2012, 2012 International Conference on Computer, Information and Telecommunication Systems (CITS).

[324]  Wei Zhao,et al.  Privacy-Preserving Data Mining Systems , 2007, Computer.

[325]  Raymond J. Mooney,et al.  Mapping and Revising Markov Logic Networks for Transfer Learning , 2007, AAAI.

[326]  Xin Yao,et al.  A principled evaluation of ensembles of learning machines for software effort estimation , 2011, Promise '11.

[327]  Michael Gertz,et al.  Mining email social networks , 2006, MSR '06.

[328]  Romain Robbes,et al.  Software systems as cities: a controlled experiment , 2011, 2011 33rd International Conference on Software Engineering (ICSE).

[329]  Xiao-Bai Li,et al.  Identity disclosure protection: A data reconstruction approach for privacy-preserving data mining , 2009, Decis. Support Syst..

[330]  José Francisco Martínez Trinidad,et al.  A new fast prototype selection method based on clustering , 2010, Pattern Analysis and Applications.

[331]  Xin Yao,et al.  Using unreliable data for creating more reliable online learners , 2012, The 2012 International Joint Conference on Neural Networks (IJCNN).

[332]  Xin Yao,et al.  Multi-Objective Approaches to Optimal Testing Resource Allocation in Modular Software Systems , 2010, IEEE Transactions on Reliability.

[333]  Claes Wohlin,et al.  Context in industrial software engineering research , 2009, 2009 3rd International Symposium on Empirical Software Engineering and Measurement.

[334]  Rachel Harrison,et al.  Multiobjective simulation optimisation in software project management , 2011, GECCO '11.

[335]  Les Hatton,et al.  Does OO Sync with How We Think? , 1998, IEEE Softw..

[336]  D. Ross Jeffery,et al.  An Empirical Study of Analogy-based Software Effort Estimation , 1999, Empirical Software Engineering.

[337]  Lionel C. Briand,et al.  An assessment and comparison of common software cost estimation modeling techniques , 1999, Proceedings of the 1999 International Conference on Software Engineering (IEEE Cat. No.99CB37002).

[338]  Tim Menzies,et al.  When to use data from other projects for effort estimation , 2010, ASE.

[339]  Jean-Marc Desharnais,et al.  A comparison of software effort estimation techniques: Using function points with neural networks, case-based reasoning and regression models , 1997, J. Syst. Softw..

[340]  Bart Baesens,et al.  Data Mining Techniques for Software Effort Estimation: A Comparative Study , 2012, IEEE Transactions on Software Engineering.

[341]  Geoffrey I. Webb,et al.  Discretization for naive-Bayes learning: managing discretization bias and variance , 2008, Machine Learning.

[342]  Alessandro Orso,et al.  Camouflage: automated anonymization of field data , 2011, 2011 33rd International Conference on Software Engineering (ICSE).

[343]  Lawrence O Gostin,et al.  Health information privacy. , 1995, Cornell law review.

[344]  Martin J. Shepperd,et al.  Comparing Software Prediction Techniques Using Simulation , 2001, IEEE Trans. Software Eng..

[345]  G Gigerenzer,et al.  Reasoning the fast and frugal way: models of bounded rationality. , 1996, Psychological review.

[346]  Peter J. Bickel,et al.  Maximum Likelihood Estimation of Intrinsic Dimension , 2004, NIPS.

[347]  Ricardo Massa Ferreira Lima,et al.  GA-based method for feature selection and parameters optimization for machine learning regression applied to software effort estimation , 2010, Inf. Softw. Technol..

[348]  Oussama El-Rawas,et al.  Understanding the Value of Software Engineering Technologies , 2009, 2009 IEEE/ACM International Conference on Automated Software Engineering.

[349]  Brendan Murphy The difficulties of building generic reliability models for software , 2011, Empirical Software Engineering.

[350]  Ferdinand Hergert,et al.  Improving model selection by nonconvergent methods , 1993, Neural Networks.

[351]  Stephen G. MacDonell,et al.  Evaluating prediction systems in software project estimation , 2012, Inf. Softw. Technol..

[352]  Tomasz Imielinski,et al.  Mining association rules between sets of items in large databases , 1993, SIGMOD Conference.

[353]  Milde M. S. Lira,et al.  Combining Multiple Artificial Neural Networks Using Random Committee to Decide upon Electrical Disturbance Classification , 2007, 2007 International Joint Conference on Neural Networks.

[354]  Jr. Frederick P. Brooks,et al.  The mythical man-month (anniversary ed.) , 1995 .

[355]  Thomas Zimmermann,et al.  Information needs for software development analytics , 2012, 2012 34th International Conference on Software Engineering (ICSE).

[356]  Arvinder Kaur,et al.  Empirical validation of object-oriented metrics for predicting fault proneness models , 2010, Software Quality Journal.

[357]  ASHWIN MACHANAVAJJHALA,et al.  L-diversity: privacy beyond k-anonymity , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[358]  Tim Menzies,et al.  Special issue on repeatable results in software engineering prediction , 2012, Empirical Software Engineering.

[359]  Barry W. Boehm Safe and Simple Software Cost Analysis , 2000, IEEE Software.

[360]  Xin Yao,et al.  Software effort estimation as a multiobjective learning problem , 2013, TSEM.

[361]  Emilia Mendes,et al.  How effective is Tabu search to configure support vector regression for effort estimation? , 2010, PROMISE '10.

[362]  Karen T. Lum,et al.  Stable rankings for different effort models , 2010, Automated Software Engineering.