Library adoption in public software repositories

We study the the spread and adoption of libraries within Python projects hosted in public software repositories on GitHub. By modelling the use of Git pull, merge, commit, and other actions as deliberate cognitive activities, we are able to better understand the dynamics of what happens when users adopt new and cognitively demanding information. For this task we introduce a large corpus containing all commits, diffs, messages, and source code from 259,690 Python repositories (about 13% of all Python projects on Github), including all Git activity data from 89,311 contributing users. In this initial work we ask two primary questions: (1) What kind of behavior change occurs near an adoption event? (2) Can we model future adoption activity of a user? Using a fine-grained analysis of user behavior, we show that library adoptions are followed by higher than normal activity within the first 6 h, implying that a higher than normal cognitive effort is involved with an adoption. Further study is needed to understand the specific types of events that surround the adoption of new information, and the cause of these dynamics. We also show that a simple linear model is capable of classifying future commits as being an adoption or not, based on the commit contents and the preceding history of the user and repository. Additional work in this vein may be able to predict the content of future commits, or suggest new libraries to users.

[1]  Viet Anh Nguyen,et al.  Mining aspects of customer’s review on the social network , 2019, J. Big Data.

[2]  John R. Anderson A spreading activation theory of memory. , 1983 .

[3]  Jacob Goldenberg,et al.  Using Complex Systems Analysis to Advance Marketing Theory Development , 2001 .

[4]  Viswanath Venkatesh,et al.  Technology Acceptance Model 3 and a Research Agenda on Interventions , 2008, Decis. Sci..

[5]  N. B. Anuar,et al.  The rise of "big data" on cloud computing: Review and open research issues , 2015, Inf. Syst..

[6]  Stephanie Watts,et al.  Capitalizing on Content: Information Adoption in Two Online communities , 2008, J. Assoc. Inf. Syst..

[7]  Tim Weninger,et al.  Identifying and Understanding User Reactions to Deceptive and Trusted Social News Sources , 2018, ACL.

[8]  Taghi M. Khoshgoftaar,et al.  A survey of open source tools for machine learning with big data in the Hadoop ecosystem , 2015, Journal of Big Data.

[9]  Adam Jacobs,et al.  The pathologies of big data , 2009, Commun. ACM.

[10]  Éva Tardos,et al.  Maximizing the Spread of Influence through a Social Network , 2015, Theory Comput..

[11]  Jun-Ho Huh,et al.  B+-tree construction on massive data with Hadoop , 2017, Cluster Computing.

[12]  W. Labov Principles Of Linguistic Change , 1994 .

[13]  Haewoon Kwak,et al.  Finding influentials based on the temporal order of information adoption in twitter , 2010, WWW '10.

[14]  Jack J. Dongarra,et al.  Exascale computing and big data , 2015, Commun. ACM.

[15]  Chris Evans,et al.  The influence of eWOM in social media on consumers' purchase intentions: An extended approach to information adoption , 2016, Comput. Hum. Behav..

[16]  Matthew K. O. Lee,et al.  The impact of electronic word-of-mouth: The adoption of online opinions in online customer communities , 2008, Internet Res..

[17]  Tim Weninger,et al.  Consumers and Curators: Browsing and Voting Patterns on Reddit , 2017, IEEE Transactions on Computational Social Systems.

[18]  Taghi M. Khoshgoftaar,et al.  A review of data mining using big data in health informatics , 2013, Journal Of Big Data.

[19]  Krishna P. Gummadi,et al.  A measurement-driven analysis of information propagation in the flickr social network , 2009, WWW '09.

[20]  Jacob Goldenberg,et al.  Talk of the Network: A Complex Systems Look at the Underlying Process of Word-of-Mouth , 2001 .

[21]  Susan J. Winter,et al.  Electronic Word-of-Mouth in Online Environments , 2006 .

[22]  Daniel G. Goldstein,et al.  The structure of online diffusion networks , 2012, EC '12.

[23]  Taghi M. Khoshgoftaar,et al.  Deep learning applications and challenges in big data analytics , 2015, Journal of Big Data.

[24]  Yuming Zhou,et al.  How Do Developers Fix Cross-Project Correlated Bugs? A Case Study on the GitHub Scientific Python Ecosystem , 2017, 2017 IEEE/ACM 39th International Conference on Software Engineering (ICSE).

[25]  Seetha Hari,et al.  Learning From Imbalanced Data , 2019, Advances in Computer and Electrical Engineering.

[26]  W. Labov Principles of Linguistic Change; Volume 3; Cognitive and cultural factors , 2010 .

[27]  Miriam A. M. Capretz,et al.  Contextual anomaly detection framework for big sensor data , 2015, Journal of Big Data.

[28]  Alexander Serebrenik,et al.  StackOverflow and GitHub: Associations between Software Development and Crowdsourced Knowledge , 2013, 2013 International Conference on Social Computing.

[29]  David Reitter,et al.  Is Word Adoption a Grassroots Process? An Analysis of Reddit Communities , 2017, SBP-BRiMS.

[30]  David Reitter,et al.  Word Adoption in Online Communities , 2019, IEEE Transactions on Computational Social Systems.

[31]  Jharna Majumdar,et al.  Analysis of agriculture data using data mining techniques: application of big data , 2017, Journal of Big Data.

[32]  Janet Metcalfe,et al.  Making related errors facilitates learning, but learners do not know it , 2012, Memory & cognition.

[33]  Cristina V. Lopes,et al.  Stack Overflow in Github: Any Snippets There? , 2017, 2017 IEEE/ACM 14th International Conference on Mining Software Repositories (MSR).

[34]  Daniela E. Damian,et al.  The promises and perils of mining GitHub , 2009, MSR 2014.

[35]  Pamela D. Morrison,et al.  Network Analysis in Marketing , 2004 .

[36]  Mark S. Granovetter Threshold Models of Collective Behavior , 1978, American Journal of Sociology.

[37]  Jun-Ho Huh,et al.  An effective security measures for nuclear power plant using big data analysis approach , 2018, The Journal of Supercomputing.

[38]  John A. Czepiel Word-of-Mouth Processes in the Diffusion of a Major Technological Innovation , 1974 .

[39]  S. R,et al.  Data Mining with Big Data , 2017, 2017 11th International Conference on Intelligent Systems and Control (ISCO).

[40]  Jure Leskovec,et al.  No country for old members: user lifecycle and linguistic change in online communities , 2013, WWW.

[41]  Jure Leskovec,et al.  Modeling Information Diffusion in Implicit Networks , 2010, 2010 IEEE International Conference on Data Mining.

[42]  Lada A. Adamic,et al.  Social influence and the diffusion of user-created content , 2009, EC '09.

[43]  Lada A. Adamic,et al.  Exposure to ideologically diverse news and opinion on Facebook , 2015, Science.

[44]  Rahul Goel,et al.  The Social Dynamics of Language Change in Online Networks , 2016, SocInfo.

[45]  Damon Centola,et al.  The Spread of Behavior in an Online Social Network Experiment , 2010, Science.

[46]  Kristina Lerman,et al.  The myopia of crowds: Cognitive load and collective evaluation of answers on Stack Exchange , 2016, PloS one.

[47]  Katherine L. Milkman,et al.  What Makes Online Content Viral? , 2012 .

[48]  Anwar M. Ghuloum,et al.  ViewpointFace the inevitable, embrace parallelism , 2009, CACM.

[49]  Jun-Ho Huh,et al.  Big Data Analysis for Personalized Health Activities: Machine Learning Processing for Automatic Keyword Extraction Approach , 2018, Symmetry.

[50]  Masahiro Kimura,et al.  Tractable Models for Information Diffusion in Social Networks , 2006, PKDD.