How not to survey developers and repositories: experiences analyzing language adoption

We present cross-sectional analyses of programming language use and reflect upon our experience in doing so. In particular, we directly analyze groups of 1,500-13,000 developers by using questionnaires and 260,000 developers indirectly so by mining 210,000 software repositories. Our analysis reveals programming language adoption phenomena surrounding developer age, birth year, workplace, and software repository preference. We find that survey methods are increasingly accessible and relevant, but there are distinctive problems in examining developers and code repositories. We show that analyzing software repositories suffers from sample bias problems similar to those encountered when directly polling developers. Such bias limits the general validity of research claims based on analysis of software repositories. We aid future empirical researchers by describing concrete practices and opportunities to improve the results of developer and software repository surveys.