PyDriller: Python framework for mining software repositories

Software repositories contain historical and valuable information about the overall development of software systems. Mining software repositories (MSR) is nowadays considered one of the most interesting growing fields within software engineering. MSR focuses on extracting and analyzing data available in software repositories to uncover interesting, useful, and actionable information about the system. Even though MSR plays an important role in software engineering research, few tools have been created and made public to support developers in extracting information from Git repository. In this paper, we present PyDriller, a Python Framework that eases the process of mining Git. We compare our tool against the state-of-the-art Python Framework GitPython, demonstrating that PyDriller can achieve the same results with, on average, 50% less LOC and significantly lower complexity. URL: https://github.com/ishepard/pydriller Materials: https://doi.org/10.5281/zenodo.1327363 Pre-print: https://doi.org/10.5281/zenodo.1327411

[1]  Andy Zaidman,et al.  Modern code reviews in open-source projects: which problems do they fix? , 2014, MSR 2014.

[2]  Alberto Bacchelli,et al.  Are Popular Classes More Defect Prone? , 2010, FASE.

[3]  Mauricio Finavaro Aniche,et al.  MetricMiner: Supporting researchers in mining software repositories , 2013, 2013 IEEE 13th International Working Conference on Source Code Analysis and Manipulation (SCAM).

[4]  Alberto Bacchelli,et al.  On the Impact of Design Flaws on Software Defects , 2010, 2010 10th International Conference on Quality Software.

[5]  Georgios Gousios,et al.  The GHTorent dataset and tool suite , 2013, 2013 10th Working Conference on Mining Software Repositories (MSR).

[6]  K. K. Chaturvedi,et al.  Tools in Mining Software Repositories , 2013, 2013 13th International Conference on Computational Science and Its Applications.

[7]  Premkumar T. Devanbu,et al.  Will They Like This? Evaluating Code Contributions with Language Models , 2015, 2015 IEEE/ACM 12th Working Conference on Mining Software Repositories.

[8]  Gabriele Bavota,et al.  Do They Really Smell Bad? A Study on Developers' Perception of Bad Code Smells , 2014, 2014 IEEE International Conference on Software Maintenance and Evolution.

[9]  Fabio Palomba,et al.  Re-evaluating method-level bug prediction , 2018, SANER.

[10]  Daniela Cruzes,et al.  Are all code smells harmful? A study of God Classes and Brain Classes in the evolution of three open source systems , 2010, 2010 IEEE International Conference on Software Maintenance.

[11]  Hajimu Iida,et al.  Participation in Modern Code Review An Empirical Study of the Android , Qt , and OpenStack Projects , 2016 .

[12]  Audris Mockus,et al.  A case study of open source software development: the Apache server , 2000, Proceedings of the 2000 International Conference on Software Engineering. ICSE 2000 the New Millennium.

[13]  Arie van Deursen,et al.  Mining Software Repositories to Study Co-Evolution of Production & Test Code , 2008, 2008 1st International Conference on Software Testing, Verification, and Validation.

[14]  Other Contributors Are Indicated Where They Contribute The Eclipse Foundation , 2017 .

[15]  Anas N. Al-Rabadi,et al.  A comparison of modified reconstructability analysis and Ashenhurst‐Curtis decomposition of Boolean functions , 2004 .

[16]  Hajimu Iida,et al.  Review participation in modern code review: An empirical study of the Android, Qt, and OpenStack projects (journal-first abstract) , 2018, SANER.

[17]  Hridesh Rajan,et al.  Boa: A language and infrastructure for analyzing ultra-large-scale software repositories , 2013, 2013 35th International Conference on Software Engineering (ICSE).

[18]  Andreas Zeller,et al.  When do changes induce fixes? , 2005, ACM SIGSOFT Softw. Eng. Notes.

[19]  Alberto Bacchelli,et al.  Expectations, outcomes, and challenges of modern code review , 2013, 2013 35th International Conference on Software Engineering (ICSE).

[20]  Margaret-Anne Storey,et al.  When Testing Meets Code Review: Why and How Developers Review Tests , 2018, 2018 IEEE/ACM 40th International Conference on Software Engineering (ICSE).

[21]  Andrea De Lucia,et al.  [Journal First] The Scent of a Smell: An Extensive Comparison Between Textual and Structural Smells , 2018, 2018 IEEE/ACM 40th International Conference on Software Engineering (ICSE).