TBIL: A Tagging-Based Approach to Identity Linkage Across Software Communities

Nowadays, developers can be involved in several software developer communities like StackOverflow and Github. Meanwhile, accounts from different communities are usually less connected. Linking these accounts, which is called identity linkage, is a prerequisite of many interesting studies such as investigating activities of one developer in two or more communities. Many researches have been performed on social networks, but very few of them can be adapted to software communities, as information of users provided in these communities has a huge difference to that in social networks. We tackle with the problem by introducing TBIL, a novel tagging-based approach to identity linkage among software communities. The essential idea of this approach is to employ skills (measured by tags), usernames and concerned topics of developers as hints, and to use a decision tree-based algorithm and another heuristic greedy matching algorithm to link user identities. We measure the effectiveness of TBIL on two well-known software communities, i.e., StackOverflow and Github. The results show that our method is feasible and practical in linking developer identities. In particular, the F-Score of our method is 0.15 higher than previous identity linkage methods in software communities.

[1]  Virgílio A. F. Almeida,et al.  Studying User Footprints in Different Online Social Networks , 2012, 2012 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining.

[2]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[3]  Jennifer Golbeck,et al.  Linking Social Networks on the Web with FOAF: A Semantic Web Case Study , 2008, AAAI.

[4]  Alexander Serebrenik,et al.  Who's who in Gnome: Using LSA to merge software repository identities , 2012, 2012 28th IEEE International Conference on Software Maintenance (ICSM).

[5]  Tom Mens,et al.  A comparison of identity merge algorithms for software repositories , 2013, Sci. Comput. Program..

[6]  Fabio Crestani,et al.  Application of Spreading Activation Techniques in Information Retrieval , 1997, Artificial Intelligence Review.

[7]  Michael Gertz,et al.  Mining email social networks , 2006, MSR '06.

[8]  Philip S. Yu,et al.  Multiple Anonymized Social Networks Alignment , 2015, 2015 IEEE International Conference on Data Mining.

[9]  Thomas J. Watson,et al.  An empirical study of the naive Bayes classifier , 2001 .

[10]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[11]  David Lo,et al.  Inferring semantically related software terms and their taxonomy by leveraging collaborative tagging , 2012, 2012 28th IEEE International Conference on Software Maintenance (ICSM).

[12]  Nitesh V. Chawla,et al.  Link Prediction and Recommendation across Heterogeneous Social Networks , 2012, 2012 IEEE 12th International Conference on Data Mining.

[13]  Andreas Hotho,et al.  Tag Recommendations in Folksonomies , 2007, LWA.

[14]  Philip S. Yu,et al.  Inferring anchor links across multiple heterogeneous social networks , 2013, CIKM.

[15]  David Lo,et al.  Tag recommendation in software information sites , 2013, 2013 10th Working Conference on Mining Software Repositories (MSR).

[16]  Roelof van Zwol,et al.  Flickr tag recommendation based on collective knowledge , 2008, WWW.

[17]  Qinghua Zheng,et al.  Combining machine learning and human judgment in author disambiguation , 2011, CIKM '11.

[18]  Lise Getoor,et al.  Collective entity resolution in relational data , 2007, TKDD.

[19]  David Lo,et al.  EnTagRec++: An enhanced tag recommendation system for software information sites , 2014, 2014 IEEE International Conference on Software Maintenance and Evolution.

[20]  Eva Zangerle,et al.  Using Tag Recommendations to Homogenize Folksonomies in Microblogging Environments , 2011, SocInfo.

[21]  Mark Steyvers,et al.  Finding scientific topics , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[22]  Andreas Hotho,et al.  A Comparison of Content-Based Tag Recommendations in Folksonomy Systems , 2007, KONT/KPP.

[23]  Premkumar T. Devanbu,et al.  How social Q&A sites are changing knowledge sharing in open source software communities , 2014, CSCW.

[24]  Ramayya Krishnan,et al.  HYDRA: large-scale social identity linkage via heterogeneous behavior modeling , 2014, SIGMOD Conference.

[25]  Peter Fankhauser,et al.  Identifying Users Across Social Tagging Systems , 2011, ICWSM.

[26]  Fan Zhang,et al.  What's in a name?: an unsupervised approach to link users across communities , 2013, WSDM.

[27]  Alexander Serebrenik,et al.  StackOverflow and GitHub: Associations between Software Development and Crowdsourced Knowledge , 2013, 2013 International Conference on Social Computing.

[28]  T. Landauer,et al.  A Solution to Plato's Problem: The Latent Semantic Analysis Theory of Acquisition, Induction, and Representation of Knowledge. , 1997 .