Collecting software defect data automatically from web site of open-source software

With the rapid development of software engineering, it is of great significance to improve the reliability of software. Only by comprehending the defects deeply can the reliability of software improve. Comprehending the defects needs a great many of software defect samples. Open-source software emerges in large numbers in recent years, and they accumulate a huge amount of information associated with software defects, which provide valuable data for the research of software defect. Therefore, an approach of extracting open-source software defect data is proposed in this research, in which the defect information and defect samples are contained. First of all, the open-source software information is obtained through the Github. Then the research of defect data extraction method based on SVM is conducted which can identify defect from obtained information. Finally, a database is established, in which the open-source software defect samples and associated information are managed. The experiment results show that the method proposed in this paper is effective and feasible.

[1]  Thomas Ragg,et al.  Using machine learning for estimating the defect content after an inspection , 2004, IEEE Transactions on Software Engineering.

[2]  Deepak Khazanchi,et al.  A Study on Defect Density of Open Source Software , 2010, 2010 IEEE/ACIS 9th International Conference on Computer and Information Science.

[3]  Ming Yang,et al.  Large-scale image classification: Fast feature extraction and SVM training , 2011, CVPR 2011.

[4]  Thomas Zimmermann,et al.  Quality of bug reports in Eclipse , 2007, eclipse '07.

[5]  Du Qing-feng Software defects prediction based on mining software respository , 2012 .

[6]  LiGuo Huang,et al.  AutoODC: Automated generation of orthogonal defect classifications , 2011, 2011 26th IEEE/ACM International Conference on Automated Software Engineering (ASE 2011).

[7]  Mathematical Analysis on Weight Vectors in Text Classification , 2012, 2012 Third Global Congress on Intelligent Systems.

[8]  K. Goseva-Popstojanova,et al.  Common Trends in Software Fault and Failure Data , 2009, IEEE Transactions on Software Engineering.

[9]  Shie-Jue Lee,et al.  A Fuzzy Self-Constructing Feature Clustering Algorithm for Text Classification , 2011, IEEE Transactions on Knowledge and Data Engineering.

[10]  J. Herbsleb,et al.  Two case studies of open source software development: Apache and Mozilla , 2002, TSEM.

[11]  Audris Mockus,et al.  Using Version Control Data to Evaluate the Impact of Software Tools: A Case Study of the Version Editor , 2002, IEEE Trans. Software Eng..

[12]  Xin Tao,et al.  Study on software reliability design criteria based on defect patterns , 2009, 2009 8th International Conference on Reliability, Maintainability and Safety.

[13]  James D. Herbsleb,et al.  Influence of social and technical factors for evaluating contribution in GitHub , 2014, ICSE.

[14]  LiGuo Huang,et al.  AutoODC: Automated generation of Orthogonal Defect Classifications , 2011, ASE.