Comparison of Data Mining Techniques in the Cloud for Software Engineering

Mining software engineering data has recently become an important research topic to meet the goal of improving the software engineering processes, software productivity, and quality. On the other hand, mining software engineering data poses several challenges such as high computational cost, hardware limitations, and data management issues (i.e., the availability, reliability, and security of data). To address these problems, this chapter proposes the application of data mining techniques in cloud, the environment on software engineering data, due to cloud computing benefits such as increased computing speed, scalability, flexibility, availability, and cost efficiency. It compares the performances of five classification algorithms (decision forest, neural network, support vector machine, logistic regression, and Bayes point machine) in the cloud in terms of both accuracy and runtime efficiency. It presents experimental studies conducted on five different real-world software engineering data related to the various software engineering tasks, including software defect prediction, software quality evaluation, vulnerability analysis, issue lifetime estimation, and code readability prediction. Experimental results show that the cloud is a powerful platform to build data mining applications for software engineering.

[1]  Ying Zou,et al.  API usage pattern recommendation for software development , 2017, J. Syst. Softw..

[2]  Hamid Reza Shahriari,et al.  Software Vulnerability Analysis and Discovery Using Machine-Learning and Data-Mining Techniques , 2017, ACM Comput. Surv..

[3]  Yan Xiao,et al.  Improving code readability classification using convolutional neural networks , 2018, Inf. Softw. Technol..

[4]  Hongsheng Xu,et al.  Application of Big Data Mining Technology in Intelligent Safe Production on Cloud Computing Platform , 2018 .

[5]  P. Raja Rajeswari,et al.  Enhancing the Performance of Crime Prediction Technique Using Data Mining , 2018 .

[6]  Kamran Sartipi,et al.  Dynamic Knowledge Extraction from Software Systems Using Sequential Pattern Mining , 2010, Int. J. Softw. Eng. Knowl. Eng..

[7]  Li Wang,et al.  DNN-Based Image Classification for Software GUI Testing , 2018, 2018 IEEE SmartWorld, Ubiquitous Intelligence & Computing, Advanced & Trusted Computing, Scalable Computing & Communications, Cloud & Big Data Computing, Internet of People and Smart City Innovation (SmartWorld/SCALCOM/UIC/ATC/CBDCom/IOP/SCI).

[8]  Md Zahidul Islam,et al.  Novel algorithms for cost-sensitive classification and knowledge discovery in class imbalanced datasets with an application to NASA software defects , 2018, Inf. Sci..

[9]  Akshi Kumar,et al.  Machine Learning from Theory to Algorithms: An Overview , 2018, Journal of Physics: Conference Series.

[10]  Tim Menzies,et al.  Better Predictors for Issue Lifetime , 2017, ArXiv.

[11]  S. Sitharama Iyengar,et al.  Data-Driven Techniques in Disaster Information Management , 2017, ACM Comput. Surv..

[12]  Shlok Gilda,et al.  Source code classification using Neural Networks , 2017, 2017 14th International Joint Conference on Computer Science and Software Engineering (JCSSE).

[13]  Ge Zhou Cloud Platform Based on Mobile Internet Service Opportunistic Drive and Application Aware Data Mining , 2015, J. Electr. Comput. Eng..

[14]  Ashish Kumar Dwivedi,et al.  Software design pattern mining using classification-based techniques , 2017, Frontiers of Computer Science.

[15]  Marcelo R. Campo,et al.  Mining textual requirements to assist architectural software design: a state of the art review , 2012, Artificial Intelligence Review.

[16]  Ming Li,et al.  CodeAttention: translating source code to comments by exploiting the code constructs , 2018, Frontiers of Computer Science.

[17]  Rashid Mijumbi,et al.  BRACE: Cloud-Based Software Reliability Assurance , 2017, 2017 IEEE International Symposium on Software Reliability Engineering Workshops (ISSREW).

[18]  Shuib Basri,et al.  Finding an effective classification technique to develop a software team composition model , 2017, J. Softw. Evol. Process..

[19]  Ghalem Belalem,et al.  Improving the Performance of Data Mining by Using Big Data in Cloud Environment , 2016, J. Inf. Knowl. Manag..

[20]  Riccardo Scandariato,et al.  Predicting Vulnerable Components: Software Metrics vs Text Mining , 2014, 2014 IEEE 25th International Symposium on Software Reliability Engineering.

[21]  Bart Baesens,et al.  Data Mining Techniques for Software Effort Estimation: A Comparative Study , 2012, IEEE Transactions on Software Engineering.

[22]  Nan Yang,et al.  A disease diagnosis and treatment recommendation system based on big data mining and cloud computing , 2018, Inf. Sci..

[23]  Burak Turhan,et al.  Data mining for software engineering and humans in the loop , 2016, Progress in Artificial Intelligence.

[24]  Debarshi Kumar Sanyal,et al.  Automated classification of software issue reports using machine learning techniques: an empirical study , 2017, Innovations in Systems and Software Engineering.

[25]  Xiaoyan Zhu,et al.  An empirical study of software change classification with imbalance data‐handling methods , 2018, Softw. Pract. Exp..

[26]  Nicole Novielli,et al.  Sentiment Polarity Detection for Software Development , 2017, Empirical Software Engineering.

[27]  Rance Cleaveland,et al.  Automatic Requirement Extraction from Test Cases , 2010, RV.

[28]  Naohiro Ishii,et al.  Machine Learning Classification to Effort Estimation for Embedded Software Development Projects , 2017, Int. J. Softw. Innov..

[29]  Mahdi Ghasemi What Requirements Engineering can Learn from Process Mining , 2018, 2018 1st International Workshop on Learning from other Disciplines for Requirements Engineering (D4RE).

[30]  Diomidis Spinellis,et al.  Data mining in software engineering , 2011, Intell. Data Anal..

[31]  Ruchika Malhotra,et al.  A systematic review of machine learning techniques for software fault prediction , 2015, Appl. Soft Comput..

[32]  Ali Yavari,et al.  Classification of Risk in Software Development Projects using Support Vector Machine , 2017 .

[33]  Domenico Talia,et al.  A Workflow Management System for Scalable Data Mining on Clouds , 2018, IEEE Transactions on Services Computing.

[34]  Yingli Liu,et al.  Method of Fault Detection in Cloud Computing Systems , 2014 .

[35]  Hui Liu,et al.  Major motivations for extract method refactorings: analysis based on interviews and change histories , 2016, Frontiers of Computer Science.

[36]  Zsuzsanna Marian,et al.  Detecting software design defects using relational association rule mining , 2013, Knowledge and Information Systems.

[37]  Tihana Galinac Grbac,et al.  Co-evolutionary multi-population genetic programming for classification in software defect prediction: An empirical case study , 2017, Appl. Soft Comput..

[38]  Vandana Bhattacherjee,et al.  Software cost estimation based on modified K-Modes clustering Algorithm , 2015, Natural Computing.

[39]  John Yearwood,et al.  A parallel framework for software defect detection and metric selection on cloud computing , 2017, Cluster Computing.

[40]  Salwa K. Abd-El-Hafiz,et al.  Characterizing software development method using metrics , 2016, J. Softw. Evol. Process..

[41]  Denys Poshyvanyk,et al.  A comprehensive model for code readability , 2018, J. Softw. Evol. Process..

[42]  D Vasumathi,et al.  Design and implementation of weather fore casting system based on cloud computing and data mining techniques , 2018 .

[43]  Hongfang Liu,et al.  Modeling the Effect of Size on Defect Proneness for Open-Source Software , 2007, ICSE 2007.

[44]  Daniel M. Berry,et al.  An Empirical Study of the Software Development Process, Including Its Requirements Engineering, at Very Large Organization: How to Use Data Mining in Such a Study , 2017, APRES.

[45]  Arif Ali Khan,et al.  Automated framework for classification and selection of software design patterns , 2019, Appl. Soft Comput..

[46]  Lin Shi,et al.  Machine learning techniques for code smell detection: A systematic literature review and meta-analysis , 2019, Inf. Softw. Technol..