A hosting service of multi-language historage repositories

In the research of Mining Software Repositories, source code repositories are one of the core sources since it contains the product and the process of software development. A source code repository stores the versions of files and makes it possible to browse the histories of files, such as modification dates, authors, messages, so on. Although such rich information of file histories is easily available, extracting the histories of methods/functions, which are elements of source code files, is not easy from general code repositories. To tackle this difficulty, we have developed Historage, a fine-grained version control system. Historage repository is a Git repository, which is built upon an original Git repository. Therefore, similar mining techniques for general Git repositories are applicable to Historage repositories. We also have developed Kataribe, a hosting service of Historage repositories, which contains hundreds of Historage repositories constructed from repositories in GitHub, which are written in C#, Java, Python and Ruby. The list of all Historage and original repositories are available at http://kataribe.naist.jp/public. With this dataset, we will promote in-depth and fine-grained software evolution research with diversity of programming languages.

[1]  Abram Hindle,et al.  GreenMiner: a hardware based mining software repositories software energy consumption framework , 2014, MSR 2014.

[2]  A. Hassan,et al.  C-REX : An Evolutionary Code Extractor for C , 2004 .

[3]  Ahmed E. Hassan,et al.  An empirical study of dormant bugs , 2014, MSR 2014.

[4]  Hideaki Hata,et al.  Impact Analysis of Granularity Levels on Feature Location Technique , 2014, APRES.

[5]  Lerina Aversano,et al.  An empirical study on the maintenance of source code clones , 2010, Empirical Software Engineering.

[6]  Audris Mockus,et al.  Towards building a universal defect prediction model , 2014, MSR 2014.

[7]  Osamu Mizuno,et al.  Historage: fine-grained version control system for Java , 2011, IWPSE-EVOL '11.

[8]  Michael W. Godfrey,et al.  Software Bertillonage , 2012, Empirical Software Engineering.

[9]  Yi Zhang,et al.  Classifying Software Changes: Clean or Buggy? , 2008, IEEE Transactions on Software Engineering.

[10]  Foutse Khomh,et al.  Classifying field crash reports for fixing bugs: A case study of Mozilla Firefox , 2011, 2011 27th IEEE International Conference on Software Maintenance (ICSM).

[11]  Ken-ichi Matsumoto,et al.  How we resolve conflict: an empirical study of method-level conflict resolution , 2015, 2015 IEEE 1st International Workshop on Software Analytics (SWAN).

[12]  Michael W. Godfrey,et al.  Facilitating software evolution research with kenyon , 2005, ESEC/FSE-13.

[13]  Gustavo Pinto,et al.  Mining questions about software energy consumption , 2014, MSR 2014.

[14]  Katsuro Inoue,et al.  Method Verb Recommendation Using Association Rule Mining in a Set of Existing Projects , 2015, IEICE Trans. Inf. Syst..

[15]  Hajimu Iida,et al.  Investigating Code Review Practices in Defective Files: An Empirical Study of the Qt System , 2015, 2015 IEEE/ACM 12th Working Conference on Mining Software Repositories.

[16]  Thomas Zimmermann,et al.  Fine-grained processing of CVS archives with APFEL , 2006, ETX.

[17]  Osamu Mizuno,et al.  Bug prediction based on fine-grained module histories , 2012, 2012 34th International Conference on Software Engineering (ICSE).

[18]  Shaohua Wang,et al.  Improving bug management using correlations in crash reports , 2014, Empirical Software Engineering.

[19]  Hajimu Iida,et al.  Kataribe: a hosting service of historage repositories , 2014, MSR 2014.

[20]  Ali Mesbah,et al.  Vejovis: suggesting fixes for JavaScript faults , 2014, ICSE.

[21]  Hideaki Hata Inferring Restructuring Operations on Logical Structure of Java Source Code , 2011 .

[22]  Jacek Czerwonka,et al.  Code Ownership and Software Quality: A Replication Study , 2015, 2015 IEEE/ACM 12th Working Conference on Mining Software Repositories.

[23]  Meiyappan Nagappan,et al.  Diversity in software engineering research , 2016, Perspectives on Data Science for Software Engineering.

[24]  Jaechang Nam,et al.  Automatic patch generation learned from human-written patches , 2013, 2013 35th International Conference on Software Engineering (ICSE).

[25]  Zarinah Mohd Kasirun,et al.  Why so complicated? Simple term filtering and weighting for location-based bug report assignment recommendation , 2013, 2013 10th Working Conference on Mining Software Repositories (MSR).

[26]  Ali Mesbah,et al.  Works for me! characterizing non-reproducible bug reports , 2014, MSR 2014.

[27]  Sooyong Park,et al.  Which Crashes Should I Fix First?: Predicting Top Crashes at an Early Stage to Prioritize Debugging Efforts , 2011, IEEE Transactions on Software Engineering.

[28]  Christian Bird,et al.  The Uniqueness of Changes: Characteristics and Applications , 2015, 2015 IEEE/ACM 12th Working Conference on Mining Software Repositories.

[29]  Michael W. Godfrey,et al.  An integrated approach for studying architectural evolution , 2002, Proceedings 10th International Workshop on Program Comprehension.

[30]  Hajimu Iida,et al.  An Approach for Fine-grained Detection of Refactoring Instances using Repository with Syntactic Information , 2015 .

[31]  Thomas Grechenig,et al.  Dataset of Developer-Labeled Commit Messages , 2015, 2015 IEEE/ACM 12th Working Conference on Mining Software Repositories.

[32]  Bram Adams,et al.  Co-evolution of Infrastructure and Source Code - An Empirical Study , 2015, 2015 IEEE/ACM 12th Working Conference on Mining Software Repositories.