Mining authorship characteristics in bug repositories

Bug reports are widely employed to facilitate software tasks in software maintenance. Since bug reports are contributed by people, the authorship characteristics of contributors may heavily impact the perfor-mance of resolving software tasks. Poorly written bug reports may delay developers when fixing bugs. However, no in-depth investigation has been conducted over the authorship characteristics. In this study, we first leverage byte-level N-grams to model the authorship characteristics and employ Normalized Simplified Profile Intersection (NSPI) to identify the similarity of the authorship characteristics. Then, we investigate a series of properties related to contributors’ authorship characteristics, including the evolvement over time and the variation among distinct products in open source projects. Moreover, we show how to leverage the authorship characteristics to facilitate a well-known task in software maintenance, namely Bug Report Summarization (BRS). Experiments on open source projects validate that incorporating the authorship characteristics can effectively improve a state-of-the-art method in BRS. Our findings suggest that contributors should retain stable authorship characteristics and the authorship characteristics can assist in resolving software tasks.创新点本文创造性的利用比特级N元文法来为缺陷仓库中的贡献者的写作风格建模, 同时引入NSPI来度量两种写作风格之间的相似度。本文研究了贡献者写作风格的一些性质, 包括贡献者写作风格随时间的变化情况以及在不同产品的变化情况等。进而利用贡献者写作风格来帮助解决一个典型的软件维护任务, 即缺陷报告摘要。本文的实验数据已经公开。实验结果表明, 利用开发者写作风格能够有效的提升缺陷报告摘要的效果

[1]  Weiqiang Zhang,et al.  An Empirical Study of Bug Fixing Rate , 2015, 2015 IEEE 39th Annual Computer Software and Applications Conference.

[2]  He Jiang,et al.  Developer prioritization in bug repositories , 2012, 2012 34th International Conference on Software Engineering (ICSE).

[3]  Stefanos Gritzalis,et al.  Effective identification of source code authors using byte-level information , 2006, ICSE.

[4]  Roger S. Pressman,et al.  Software Engineering: A Practitioner's Approach (McGraw-Hill Series in Computer Science) , 2004 .

[5]  Gail C. Murphy,et al.  Who should fix this bug? , 2006, ICSE.

[6]  Rong Zheng,et al.  A framework for authorship identification of online messages: Writing-style features and classification techniques , 2006, J. Assoc. Inf. Sci. Technol..

[7]  Spiros Mancoridis,et al.  A Probabilistic Approach to Source Code Authorship Identification , 2007, Fourth International Conference on Information Technology (ITNG'07).

[8]  Efstathios Stamatatos,et al.  Computer-Based Authorship Attribution Without Lexical Measures , 2001, Comput. Humanit..

[9]  Premkumar T. Devanbu,et al.  Ownership, experience and defects: a fine-grained study of authorship , 2011, 2011 33rd International Conference on Software Engineering (ICSE).

[10]  Jian Zhou,et al.  Where should the bugs be fixed? More accurate information retrieval-based bug localization based on bug reports , 2012, 2012 34th International Conference on Software Engineering (ICSE).

[11]  Vandana Bhattacherjee,et al.  Software Fault Prediction Using Quad Tree-Based K-Means Clustering Algorithm , 2012, IEEE Transactions on Knowledge and Data Engineering.

[12]  Krzysztof Czarnecki,et al.  Improving Bug Report Comprehension , 2012 .

[13]  Weiqiang Zhang,et al.  Developer social networks in software engineering: construction, analysis, and applications , 2014, Science China Information Sciences.

[14]  Aoying Zhou,et al.  Product-oriented review summarization and scoring , 2015, Frontiers of Computer Science.

[15]  Gail C. Murphy,et al.  Reducing the effort of bug report triage: Recommenders for development-oriented decisions , 2011, TSEM.

[16]  Chao Liu,et al.  Data Mining for Software Engineering , 2009, Computer.

[17]  Efstathios Stamatatos,et al.  A survey of modern authorship attribution methods , 2009, J. Assoc. Inf. Sci. Technol..

[18]  Spiros Mancoridis,et al.  On the Use of Discretized Source Code Metrics for Author Identification , 2009, 2009 1st International Symposium on Search Based Software Engineering.

[19]  Gail C. Murphy,et al.  Automatic Summarization of Bug Reports , 2014, IEEE Transactions on Software Engineering.

[20]  Frank Tip,et al.  Finding Bugs in Web Applications Using Dynamic Test Generation and Explicit-State Model Checking , 2010, IEEE Transactions on Software Engineering.

[21]  Paul H. Heintzen,et al.  Automated Video-Angiocardiographic Image Analysis , 1975, Computer.

[22]  Roger S. Pressman,et al.  Software Engineering: A Practitioner's Approach , 1982 .

[23]  Giuseppe Carenini,et al.  Summarizing Emails with Conversational Cohesion and Subjectivity , 2008, ACL.

[24]  Harald C. Gall,et al.  Don't touch my code!: examining the effects of ownership on software quality , 2011, ESEC/FSE '11.

[25]  Andreas Zeller,et al.  It's not a bug, it's a feature: How misclassification impacts bug prediction , 2013, 2013 35th International Conference on Software Engineering (ICSE).

[26]  Thomas Zimmermann,et al.  What Makes a Good Bug Report? , 2008, IEEE Transactions on Software Engineering.

[27]  Spiros Mancoridis,et al.  Using code metric histograms and genetic algorithms to perform author identification for software forensics , 2007, GECCO '07.

[28]  Senthil Mani,et al.  AUSUM: approach for unsupervised bug report summarization , 2012, SIGSOFT FSE.

[29]  Ani Nenkova,et al.  Evaluating Content Selection in Summarization: The Pyramid Method , 2004, NAACL.

[30]  Andrew Turpin,et al.  Comparing techniques for authorship attribution of source code , 2014, Softw. Pract. Exp..

[31]  Sunghun Kim,et al.  Reducing Features to Improve Code Change-Based Bug Prediction , 2013, IEEE Transactions on Software Engineering.

[32]  Fuchun Peng,et al.  N-GRAM-BASED AUTHOR PROFILES FOR AUTHORSHIP ATTRIBUTION , 2003 .