The advancement of artificial intelligence and the imple-mentation of machine learning capabilities in programming languages such as Python, along with cloud services, allow researchers to apply methods to cluster and predict behav-iors and patterns in software engineering data. On the other hand, these methods need a large amount of data in order to work with high accuracy in different contexts. This paper introduces Sonarlizer Xplorer: a tool that captures a large number of technical debt items and code metrics from pub-lic GitHub projects. Sonarlizer Xplorer is composed of two sub-tools. The first is Github Xplorer, responsible for mining public Github repositories from an initial project. The second is Sonarlizer, responsible for taking projects and analyzing them using SonarQube. We used the tool over four months, collecting technical debt items and code metrics on almost 46,000 public Java projects. In addition, we mined over 57 million repositories and 4 million users.
[1]
G. Ann Campbell,et al.
SonarQube in Action
,
2013
.
[2]
Georgios Gousios,et al.
The GHTorent dataset and tool suite
,
2013,
2013 10th Working Conference on Mining Software Repositories (MSR).
[3]
Mark Harman,et al.
The role of Artificial Intelligence in Software Engineering
,
2012,
2012 First International Workshop on Realizing AI Synergies in Software Engineering (RAISE).
[4]
Marco Tulio Valente,et al.
Study on the relevance of the warnings reported by Java bug-finding tools
,
2011,
IET Softw..
[5]
David Hovemeyer,et al.
Using Static Analysis to Find Bugs
,
2008,
IEEE Software.
[6]
David R. Barstow,et al.
Artificial intelligence and software engineering
,
1987,
ICSE '87.