A Machine Learning Based Ensemble Method for Automatic Multiclass Classification of Decisions

Stakeholders make various types of decisions with respect to requirements, design, management, and so on during the software development life cycle. Nevertheless, these decisions are typically not well documented and classified due to limited human resources, time, and budget. To this end, automatic approaches provide a promising way. In this paper, we aimed at automatically classifying decisions into five types to help stakeholders better document and understand decisions. First, we collected a dataset from the Hibernate developer mailing list. We then experimented and evaluated 270 configurations regarding feature selection, feature extraction techniques, and machine learning classifiers to seek the best configuration for classifying decisions. Especially, we applied an ensemble learning method and constructed ensemble classifiers to compare the performance between ensemble classifiers and base classifiers. Our experiment results show that (1) feature selection can decently improve the classification results; (2) ensemble classifiers can outperform base classifiers provided that ensemble classifiers are well constructed; (3) BoW + 50% features selected by feature selection with an ensemble classifier that combines Naïve Bayes (NB), Logistic Regression (LR), and Support Vector Machine (SVM) achieves the best classification result (with a weighted precision of 0.750, a weighted recall of 0.739, and a weighted F1-score of 0.727) among all the configurations. Our work can benefit various types of stakeholders in software development through providing an automatic approach for effectively classifying decisions into specific types that are relevant to their interests.

[1]  Jacob Cohen A Coefficient of Agreement for Nominal Scales , 1960 .

[2]  Dirk Van,et al.  Ensemble Methods: Foundations and Algorithms , 2012 .

[3]  Zengyang Li,et al.  Automatic Identification of Decisions from the Hibernate Developer Mailing List , 2020, EASE.

[4]  Andrew McCallum,et al.  A comparison of event models for naive bayes text classification , 1998, AAAI 1998.

[5]  Peng Liang,et al.  Decisions and Their Making in OSS Development: An Exploratory Study Using the Hibernate Developer Mailing List , 2019, 2019 26th Asia-Pacific Software Engineering Conference (APSEC).

[6]  Alexander Chatzigeorgiou,et al.  Automatic Identification of Assumptions from the Hibernate Developer Mailing List , 2019, 2019 26th Asia-Pacific Software Engineering Conference (APSEC).

[7]  Xin Rong,et al.  word2vec Parameter Learning Explained , 2014, ArXiv.

[8]  Amjed Tahir,et al.  Automatic Identification of Code Smell Discussions on Stack Overflow: A Preliminary Investigation , 2020, ESEM.

[9]  Emilia Mendes,et al.  The relationship between personality and decision-making: A Systematic literature review , 2019, Inf. Softw. Technol..

[10]  Peng Liang,et al.  Automatic Classification of Non-Functional Requirements from Augmented App User Reviews , 2017, EASE.

[11]  Gyu Sang Choi,et al.  Classification of Shopify App User Reviews Using Novel Multi Text Features , 2020, IEEE Access.

[12]  Uwe Hohenstein,et al.  Automatic Extraction of Design Decisions from Issue Management Systems: A Machine Learning Based Approach , 2017, ECSA.

[13]  Dietmar Pfahl,et al.  Simple App Review Classification with Only Lexical Features , 2018, ICSOFT.

[14]  Janet E. Burge,et al.  Using Text Mining Techniques to Extract Rationale from Existing Documentation , 2015 .

[15]  Uwe Hohenstein,et al.  ADeX: A Tool for Automatic Curation of Design Decision Knowledge for Architectural Decision Recommendations , 2019, 2019 IEEE International Conference on Software Architecture Companion (ICSA-C).

[16]  Peng Liang,et al.  Will Data Influence the Experiment Results?: A Replication Study of Automatic Identification of Decisions , 2021, 2021 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER).

[17]  George Forman,et al.  An Extensive Empirical Study of Feature Selection Metrics for Text Classification , 2003, J. Mach. Learn. Res..

[18]  Barbara Paech,et al.  Empirical research for software architecture decision making: An analysis , 2019, J. Syst. Softw..

[19]  Raymond McCall,et al.  Rationale Management in Software Engineering: Concepts and Techniques , 2006 .

[20]  Tim Menzies,et al.  Easy over hard: a case study on deep learning , 2017, ESEC/SIGSOFT FSE.

[21]  Bernd Bruegge,et al.  Ensemble Methods for App Review Classification: An Approach for Software Evolution (N) , 2015, 2015 30th IEEE/ACM International Conference on Automated Software Engineering (ASE).

[22]  Walid Maalej,et al.  On the automatic classification of app reviews , 2016, Requirements Engineering.

[23]  Jos Nijhuis,et al.  Design Decisions: The Bridge between Rationale and Architecture , 2006 .

[24]  Peng Liang,et al.  How Do Open Source Communities Document Software Architecture: An Exploratory Survey , 2014, 2014 19th International Conference on Engineering of Complex Computer Systems.

[25]  Estevam R. Hruschka,et al.  Tweet sentiment analysis with classifier ensembles , 2014, Decis. Support Syst..

[26]  Yiming Yang,et al.  A Comparative Study on Feature Selection in Text Categorization , 1997, ICML.

[27]  David Lo,et al.  Identifying self-admitted technical debt in open source projects using text mining , 2017, Empirical Software Engineering.

[28]  Xin Yao,et al.  Diversity creation methods: a survey and categorisation , 2004, Inf. Fusion.

[29]  Nenad Medvidovic,et al.  Recovering Architectural Design Decisions , 2018, 2018 IEEE International Conference on Software Architecture (ICSA).

[30]  Hinrich Schütze,et al.  Book Reviews: Foundations of Statistical Natural Language Processing , 1999, CL.

[31]  Massimiliano Di Penta,et al.  What kind of questions do developers ask on Stack Overflow? A comparison of automated approaches to classify posts into question categories , 2019, Empirical Software Engineering.

[32]  Scott W. Ambler,et al.  Agile modeling: effective practices for extreme programming and the unified process , 2002 .

[33]  Per Runeson,et al.  Guidelines for conducting and reporting case study research in software engineering , 2009, Empirical Software Engineering.

[34]  Mark Harman,et al.  Are developers aware of the architectural impact of their changes? , 2017, 2017 32nd IEEE/ACM International Conference on Automated Software Engineering (ASE).

[35]  H. D. Rombach,et al.  The Goal Question Metric Approach , 1994 .