Interests, Difficulties, Sentiments, and Tool Usages of Concurrency Developers: A Large-Scale Study on Stack Overflow

Context Software developers are increasingly required to write concurrent code that is correct. However, they find correct concurrent programming challenging. To help these developers, it is necessary to understand concurrency topics they are interested in, their difficulty in finding answers for questions in these topics, their sentiment for these topics, and how they use concurrency tools and techniques to guarantee correctness. Interests, difficulties, sentiments, and tool usages of concurrency developers can affect their productivity. Objective In this work, we conduct a large-scale study on the entirety of Stack Overflow to understand interests, difficulties, sentiments, and tool usages of concurrency developers. Method To conduct this study, we take the following major steps. First, we develop a set of concurrency tags to extract concurrency questions and answers from Stack Overflow. Second, we group these questions and answers into concurrency topics, categories, and a topic hierarchy. Third, we analyze popularities, difficulties, and sentiments of these concurrency topics and their correlations. Fourth, we develop a set of race tool keywords to extract concurrency questions about data race tools and group these questions into race tool topics. We focus on data races because they are among the most prevalent concurrency bugs. Finally, we discuss the implications of our findings for the practice, research, and education of concurrent software development, investigate the relation of our findings with the findings of the previous work, and present a set of example questions that developers ask for each of our concurrency and tool topics as well as categories. Results A few findings of our study are: ¶ questions that concurrency developers ask can be grouped into a hierarchy with 27 concurrency topics under 8 ∗Corresponding author. Email addresses: mbagherzadeh@oakland.edu (Mehdi Bagherzadeh), sfahmed@oakland.edu (Syed Ahmed), ssripathi@oakland.edu (Srilakshmi Sripathi), raffi.khatchadourian@hunter.cuny.edu (Raffi Khatchadourian) Preprint submitted to Elsevier September 13, 2021 ar X iv :2 10 9. 03 13 8v 3 [ cs .S E ] 1 0 Se p 20 21 major categories, · thread safety is among the most popular concurrency topics and client-server concurrency is among the least popular, ̧ irreproducible behavior is among the most difficult topics and memory consistency is among the least difficult, 1 data scraping is among the most positive concurrency topics and irreproducible behavior is among the most negative, o root cause identification has the most number of questions for usage of data race tools and alternative use has the least. While some of our findings agree with those of previous work, others sharply contrast. Conclusion The results of our study can not only help concurrency developers but also concurrency educators and researchers to better decide where to focus their efforts, by trading off one concurrency topic against another.

[1]  Sushil Krishna Bajracharya,et al.  Analyzing and mining a code search engine usage log , 2010, Empirical Software Engineering.

[2]  Hridesh Rajan,et al.  On ordering problems in message passing software , 2016, MODULARITY.

[3]  Raffi Khatchadourian,et al.  Going big: a large-scale study on what big data developers ask , 2019, ESEC/SIGSOFT FSE.

[4]  Na Meng,et al.  Secure Coding Practices in Java: Challenges and Vulnerabilities , 2017, 2018 IEEE/ACM 40th International Conference on Software Engineering (ICSE).

[5]  Georgia Koutrika,et al.  Questioning Yahoo! Answers , 2007 .

[6]  Marieke Huisman,et al.  How Do Developers Use APIs? A Case Study in Concurrency , 2013, 2013 18th International Conference on Engineering of Complex Computer Systems.

[7]  Daniel Sundmark,et al.  Concurrency bugs in open source software: a case study , 2017, Journal of Internet Services and Applications.

[8]  Bram Adams,et al.  Monitoring sentiment in open source mailing lists: exploratory study on the apache ecosystem , 2014, CASCON.

[9]  Nicole Novielli,et al.  Sentiment Polarity Detection for Software Development , 2017, Empirical Software Engineering.

[10]  Gustavo Pinto,et al.  A study on the most popular questions about concurrent programming , 2015, PLATEAU@SPLASH.

[11]  Mehdi Bagherzadeh,et al.  What do concurrency developers ask about?: a large-scale study using stack overflow , 2018, ESEM.

[12]  Anindya Iqbal,et al.  SentiCR: A customized sentiment analysis tool for code review interactions , 2017, 2017 32nd IEEE/ACM International Conference on Automated Software Engineering (ASE).

[13]  Rabe Abdalkareem,et al.  Challenges in Chatbot Development: A Study of Stack Overflow Posts , 2020, 2020 IEEE/ACM 17th International Conference on Mining Software Repositories (MSR).

[14]  Ali Mesbah,et al.  Mining questions asked by web developers , 2014, MSR 2014.

[15]  Marcelo Serrano Zanetti,et al.  The Role of Emotions in Contributors Activity: A Case Study on the GENTOO Community , 2013, 2013 International Conference on Cloud and Green Computing.

[16]  Bram Adams,et al.  Do developers feel emotions? an exploratory analysis of emotions in software artifacts , 2014, MSR 2014.

[17]  Nachiappan Nagappan,et al.  Concurrency at Microsoft – An Exploratory Survey , 2008 .

[18]  Christoph Treude,et al.  How do programmers ask and answer questions on the web?: NIER track , 2011, 2011 33rd International Conference on Software Engineering (ICSE).

[19]  Baishakhi Ray,et al.  An Empirical Study on the Use and Misuse of Java 8 Streams , 2020, FASE.

[20]  Thomas Zimmermann,et al.  Security Trend Analysis with CVE Topic Models , 2010, 2010 IEEE 21st International Symposium on Software Reliability Engineering.

[21]  Eleni Stroulia,et al.  On the Personality Traits of StackOverflow Users , 2013, 2013 IEEE International Conference on Software Maintenance.

[22]  Hridesh Rajan,et al.  A comprehensive study on deep learning bug characteristics , 2019, ESEC/SIGSOFT FSE.

[23]  Hridesh Rajan,et al.  Repairing Deep Neural Networks: Fix Patterns and Challenges , 2020, 2020 IEEE/ACM 42nd International Conference on Software Engineering (ICSE).

[24]  Michael W. Godfrey,et al.  What's hot and what's not: Windowed developer topic analysis , 2009, 2009 IEEE International Conference on Software Maintenance.

[25]  Yu Lin,et al.  Study and Refactoring of Android Asynchronous Programming , 2015 .

[26]  Nicole Novielli,et al.  A Gold Standard for Emotion Annotation in Stack Overflow , 2018, 2018 IEEE/ACM 15th International Conference on Mining Software Repositories (MSR).

[27]  Nicole Novielli,et al.  Bootstrapping a Lexicon for Emotional Arousal in Software Engineering , 2017, 2017 IEEE/ACM 14th International Conference on Mining Software Repositories (MSR).

[28]  Munmun De Choudhury,et al.  Understanding affect in the workplace via social media , 2013, CSCW.

[29]  Lada A. Adamic,et al.  Knowledge sharing and yahoo answers: everyone knows something , 2008, WWW.

[30]  Nicole Novielli,et al.  A Benchmark Study on Sentiment Analysis for Software Engineering Research , 2018, 2018 IEEE/ACM 15th International Conference on Mining Software Repositories (MSR).

[31]  Alexander Serebrenik,et al.  On negative results when using sentiment analysis tools for software engineering research , 2017, Empirical Software Engineering.

[32]  Bonita Sharif,et al.  Analyzing Developer Sentiment in Commit Logs , 2016, 2016 IEEE/ACM 13th Working Conference on Mining Software Repositories (MSR).

[33]  J Allan,et al.  Readings in information retrieval. , 1998 .

[34]  Edward A. Lee The problem with threads , 2006, Computer.

[35]  Mike Thelwall,et al.  Sentiment strength detection for the social web , 2012, J. Assoc. Inf. Sci. Technol..

[36]  Ahmed E. Hassan,et al.  What are developers talking about? An analysis of topics and trends in Stack Overflow , 2014, Empirical Software Engineering.

[37]  Yuanyuan Zhou,et al.  Learning from mistakes: a comprehensive study on real world concurrency bug characteristics , 2008, ASPLOS.

[38]  Minhaz Fahim Zibran,et al.  Leveraging Automated Sentiment Analysis in Software Engineering , 2017, 2017 IEEE/ACM 14th International Conference on Mining Software Repositories (MSR).

[39]  Yang Li,et al.  Sentiment analysis of commit comments in GitHub: an empirical study , 2014, MSR 2014.

[40]  Hridesh Rajan,et al.  Order types: static reasoning about message races in asynchronous message passing concurrency , 2017, AGERE!@SPLASH.

[41]  Sally Fincher,et al.  Making sense of card sorting data , 2005, Expert Syst. J. Knowl. Eng..

[42]  Hridesh Rajan,et al.  Panini: a concurrent programming model for solving pervasive and oblivious interference , 2015, MODULARITY.

[43]  Emad Shihab,et al.  What are mobile developers asking about? A large scale study using stack overflow , 2016, Empirical Software Engineering.

[44]  Bruno Leonardo Barros Silva,et al.  Sentiment Analysis of Travis CI Builds , 2017, 2017 IEEE/ACM 14th International Conference on Mining Software Repositories (MSR).

[45]  Mira Mezini,et al.  "Jumping Through Hoops": Why do Java Developers Struggle with Cryptography APIs? , 2016, 2016 IEEE/ACM 38th International Conference on Software Engineering (ICSE).

[46]  Assaf Schuster,et al.  MultiRace: efficient on‐the‐fly data race detection in multithreaded C++ programs , 2007, Concurr. Comput. Pract. Exp..

[47]  Gustavo Pinto,et al.  A large-scale study on the usage of Java's concurrent programming constructs , 2015, J. Syst. Softw..

[48]  Ahmed E. Hassan,et al.  Understanding the factors for fast answers in technical Q&A websites , 2017, Empirical Software Engineering.

[49]  Mehdi Bagherzadeh,et al.  [Engineering Paper] A Tool for Optimizing Java 8 Stream Software via Automated Refactoring , 2018, 2018 IEEE 18th International Working Conference on Source Code Analysis and Manipulation (SCAM).

[50]  Martin F. Porter,et al.  An algorithm for suffix stripping , 1997, Program.

[51]  Nico L. U. van Meeteren, Paul J. M. Helders Why? , 2000 .

[52]  Xinli Yang,et al.  What Security Questions Do Developers Ask? A Large-Scale Study of Stack Overflow Posts , 2016, Journal of Computer Science and Technology.

[53]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[54]  Michele Marchesi,et al.  Are Bullies More Productive? Empirical Study of Affectiveness vs. Issue Fixing Time , 2015, 2015 IEEE/ACM 12th Working Conference on Mining Software Repositories.

[55]  Caitlin Sadowski,et al.  How Developers Use Data Race Detection Tools , 2014, PLATEAU.

[56]  Shin Hong,et al.  A survey of race bug detection techniques for multithreaded programmes , 2015, Softw. Test. Verification Reliab..

[57]  Lena Mamykina,et al.  Design lessons from the fastest q&a site in the west , 2011, CHI.

[58]  Nancy G. Leveson,et al.  An investigation of the Therac-25 accidents , 1993, Computer.

[59]  Letha H. Etzkorn,et al.  Configuring latent Dirichlet allocation based feature location , 2014, Empirical Software Engineering.

[60]  Gustavo Pinto,et al.  Mining questions about software energy consumption , 2014, MSR 2014.

[61]  Mehdi Bagherzadeh,et al.  Safe Automated Refactoring for Intelligent Parallelization of Java 8 Streams , 2019, 2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE).

[62]  Charles A. Sutton,et al.  Why, when, and what: Analyzing Stack Overflow questions by topic, type, and code , 2013, 2013 10th Working Conference on Mining Software Repositories (MSR).