Do Java programmers write better Python? Studying off-language code quality on GitHub

There are style guides and best practices for many programming languages. Their goal is to promote uniformity and readability of code, consequentially reducing the chance of errors. While programmers who are frequently using the same programming language tend to internalize most of its best practices eventually, little is known about what happens when they casually switch languages and write code in a less familiar language. Insights into the factors that lead to coding convention violations could help to improve tutorials for programmers switching languages, make teachers aware of mistakes they might expect depending on what language students have been using before, or influence the order in which programming languages are taught. To approach this question, we make use of a large-scale data set representing a major part of the open source development activity happening on GitHub. In this data set, we search for Java and C++ programmers that occasionally program Python and study their Python code quality using a lint tool. Comparing their defect rates to those from Python programmers reveals significant effects in both directions: We observe that some of Python's best practices have more widespread adoption among Java and C++ programmers than Python experts. At the same time, python-specific coding conventions, especially indentation, scoping, and the use of semicolons, are violated more frequently. We conclude that programming off-language is not generally associated with better or worse code quality, but individual coding conventions are violated more or less frequently depending on whether they are more universal or language-specific. We intend to motivate a discussion and more research on what causes these effects, how we can mitigate or use them for good, and which related effects can be studied using the presented data set.

[1]  Georgios Gousios,et al.  The GHTorent dataset and tool suite , 2013, 2013 10th Working Conference on Mining Software Repositories (MSR).

[2]  Shriram Krishnamurthi,et al.  Can we crowdsource language design? , 2017, Onward!.

[3]  Premkumar T. Devanbu,et al.  A large scale study of programming languages and code quality in github , 2014, SIGSOFT FSE.

[4]  Ioannis Stamelos,et al.  Code quality analysis in open source software development , 2002, Inf. Syst. J..

[5]  Jan Vitek,et al.  DéjàVu: a map of code duplicates on GitHub , 2017, Proc. ACM Program. Lang..