Understanding the syntactic rule usage in java

Abstract Context: Syntax is fundamental to any programming language: syntax defines valid programs. In the 1970s, computer scientists rigorously and empirically studied programming languages to guide and inform language design. Since then, language design has been artistic, driven by the aesthetic concerns and intuitions of language architects. Despite recent studies on small sets of selected language features, we lack a comprehensive, quantitative, empirical analysis of how modern, real-world source code exercises the syntax of its programming language. Objective: This study aims to understand how programming language syntax is employed in actual development and explore their potential applications based on the results of syntax usage analysis. Method: We present our results on the first such study on Java, a modern, mature, and widely-used programming language. Our corpus contains over 5000 open-source Java projects, totalling 150 million source lines of code (SLoC). We study both independent (i.e. applications of a single syntax rule) and dependent (i.e. applications of multiple syntax rules) rule usage, and quantify their impact over time and project size. Results: Our study provides detailed quantitative information and yields insight, particularly (i) confirming the conventional wisdom that the usage of syntax rules is Zipfian; (ii) showing that the adoption of new rules and their impact on the usage of pre-existing rules vary significantly over time; and (iii) showing that rule usage is highly contextual. Conclusions: Our findings suggest potential applications across language design, code suggestion and completion, automatic syntactic sugaring, and language restriction.

[1]  Zhendong Su,et al.  On the naturalness of software , 2012, ICSE 2012.

[2]  Edsger W. Dijkstra,et al.  Letters to the editor: go to statement considered harmful , 1968, CACM.

[3]  Ewan D. Tempero,et al.  Understanding the syntax barrier for novices , 2011, ITiCSE '11.

[4]  Zvi Weiss,et al.  An Empirical Study of APL Programs , 1977, Comput. Lang..

[5]  Benjamin Livshits,et al.  Reflection Analysis for Java , 2005, APLAS.

[6]  Mark Davies,et al.  Mining Programming Language Vocabularies from Source Code , 2009, PPIG.

[7]  Insup Lee,et al.  A contextual analysis of Pascal programs , 1982, Softw. Pract. Exp..

[8]  R. J. Chevance,et al.  Static profile and dynamic behavior of COBOL programs , 1978, SIGP.

[9]  Bruce Eckel Thinking in Java (4th Edition) , 2005 .

[10]  Joshua J. Bloch Effective Java, 2nd Edition , 2008, The Java series ... from the source.

[11]  Premkumar T. Devanbu,et al.  A large scale study of programming languages and code quality in github , 2014, SIGSOFT FSE.

[12]  Carlo Ghezzi,et al.  An empirical investigation into a large-scale Java open source code repository , 2010, ESEM '10.

[13]  Premkumar T. Devanbu,et al.  On the localness of software , 2014, SIGSOFT FSE.

[14]  Emerson R. Murphy-Hill,et al.  Adoption and use of Java generics , 2012, Empirical Software Engineering.

[15]  Guy L. Steele,et al.  The Java Language Specification, Java SE 8 Edition , 2013 .

[16]  Andrew M. Sutton,et al.  Identification of Idiom Usage in C++ Generic Libraries , 2010, 2010 IEEE 18th International Conference on Program Comprehension.

[17]  Erik Linstead,et al.  Exploring Java software vocabulary: A search and mining perspective , 2009, 2009 ICSE Workshop on Search-Driven Development-Users, Infrastructure, Tools and Evaluation.

[18]  J. D. Gordon,et al.  Static profile of COBOL programs , 1975, SIGP.

[19]  Donghoon Kim,et al.  Measuring Syntactic Sugar Usage in Programming Languages: An Empirical Study of C# and Java Projects , 2014 .

[20]  Michael Hoppe,et al.  Do developers benefit from generic types?: an empirical comparison of generic and raw types in java , 2013, OOPSLA.

[21]  Leo A. Meyerovich,et al.  Empirical analysis of programming language adoption , 2013, OOPSLA.

[22]  Robert C. Martin Clean Code - a Handbook of Agile Software Craftsmanship , 2008 .

[23]  Hridesh Rajan,et al.  Boa: A language and infrastructure for analyzing ultra-large-scale software repositories , 2013, 2013 35th International Conference on Software Engineering (ICSE).

[24]  Donald E. Knuth,et al.  An empirical study of FORTRAN programs , 1971, Softw. Pract. Exp..

[25]  Ewan D. Tempero,et al.  Understanding the shape of Java software , 2006, OOPSLA '06.

[26]  Charles A. Sutton,et al.  Mining idioms from source code , 2014, SIGSOFT FSE.

[27]  David M. W. Powers,et al.  Applications and Explanations of Zipf’s Law , 1998, CoNLL.

[28]  Andreas Stefik,et al.  An Empirical Investigation into Programming Language Syntax , 2013, TOCE.

[29]  Zhendong Su,et al.  A study of the uniqueness of source code , 2010, FSE '10.

[30]  Emerson R. Murphy-Hill,et al.  Java generics adoption: how new features are introduced, championed, or ignored , 2011, MSR '11.

[31]  Guy L. Steele,et al.  The Java Language Specification , 1996 .

[32]  Hridesh Rajan,et al.  Mining billions of AST nodes to study actual and potential usage of Java language features , 2014, ICSE.

[33]  Stan Jarzabek,et al.  An Empirical Study on Limits of Clone Unification Using Generics , 2005, SEKE.

[34]  Michael Stepp,et al.  An empirical study of Java bytecode programs , 2007, Softw. Pract. Exp..