Code style analytics for the automatic setting of formatting rules in IDEs: A solution to the Tabs vs. Spaces Debate

The use of code style is very important since it conveys meaning as well as intent of source code. Developers are used to reading code according to their preferred style but those guidelines of proper style vary among software teams, and even different companies. Code style decisions are typically made by managers of software developers, but we would like to investigate how common the different variations of code style are. There are also automated tools to convert code style in a file, however the tools must be configured manually. In this paper, we present a tool for the collection and analysis of code style metrics. We demonstrate the feasibility of scanning existing source code to automatically generate the code style rules for existing tools. We also look at the results of our data mining to look at trends in source code. We perform a quantitative analysis on source code for questions like: How many functions are in a class, on average? How many lines of code are in a method, on average? We also present graphs of the distribution of these data, as well look at special cases of outliers.

[1]  Robert W. Bowdidge,et al.  Why don't software developers use static analysis tools to find bugs? , 2013, 2013 35th International Conference on Software Engineering (ICSE).

[2]  Andreas Zeller,et al.  Learning from 6,000 projects: lightweight cross-project anomaly detection , 2010, ISSTA '10.

[3]  Curtis R. Cook,et al.  A paradigm for programming style research , 1988, SIGP.

[4]  Charles A. Sutton,et al.  Learning natural coding conventions , 2014, SIGSOFT FSE.

[5]  Christoph Treude,et al.  Awareness 2.0: staying aware of projects, developers and tasks using dashboards and feeds , 2010, 2010 ACM/IEEE 32nd International Conference on Software Engineering.

[6]  Charles A. Sutton,et al.  Mining source code repositories at massive scale using language modeling , 2013, 2013 10th Working Conference on Mining Software Repositories (MSR).

[7]  Zhendong Su,et al.  A study of the uniqueness of source code , 2010, FSE '10.

[8]  Stéphane Ducasse,et al.  Semantic clustering: Identifying topics in source code , 2007, Inf. Softw. Technol..