SCSMiner: mining social coding sites for software developer recommendation with relevance propagation

With the advent of social coding sites, software development has entered a new era of collaborative work. Social coding sites (e.g., GitHub) can integrate social networking and distributed version control in a unified platform to facilitate collaborative developments over the world. One unique characteristic of such sites is that the past development experiences of developers provided on the sites convey the implicit metrics of developer’s programming capability and expertise, which can be applied in many areas, such as software developer recruitment for IT corporations. Motivated by this intuition, we aim to develop a framework to effectively locate the developers with right coding skills. To achieve this goal, we devise a generativ e probabilistic expert ranking model upon which a consistency among projects is incorporated as graph regularization to enhance the expert ranking and a perspective of relevance propagation illustration is introduced. For evaluation, StackOverflow is leveraged to complement the ground truth of expert. Finally, a prototype system, SCSMiner, which provides expert search service based on a real-world dataset crawled from GitHub is implemented and demonstrated.

[1]  Fan Chung,et al.  Spectral Graph Theory , 1996 .

[2]  W. Bruce Croft,et al.  A language modeling approach to information retrieval , 1998, SIGIR '98.

[3]  Thomas Hofmann,et al.  Probabilistic latent semantic indexing , 1999, SIGIR '99.

[4]  장윤희,et al.  Y. , 2003, Industrial and Labor Relations Terms.

[5]  Bernhard Schölkopf,et al.  Learning with Local and Global Consistency , 2003, NIPS.

[6]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[7]  Stephen Soderland,et al.  Learning Information Extraction Rules for Semi-Structured and Free Text , 1999, Machine Learning.

[8]  B. Schölkopf,et al.  A Regularization Framework for Learning from Graph Data , 2004, ICML 2004.

[9]  Ellen M. Voorhees,et al.  Retrieval evaluation with incomplete information , 2004, SIGIR '04.

[10]  John D. Lafferty,et al.  Statistical Models for Text Segmentation , 1999, Machine Learning.

[11]  Thomas L. Griffiths,et al.  Probabilistic author-topic models for information discovery , 2004, KDD.

[12]  Bernhard Schölkopf,et al.  Learning from labeled and unlabeled data on a directed graph , 2005, ICML.

[13]  Craig MacDonald,et al.  Voting for candidates: adapting data fusion techniques for an expert search task , 2006, CIKM '06.

[14]  M. de Rijke,et al.  Formal models for expert finding in enterprise corpora , 2006, SIGIR.

[15]  ChengXiang Zhai,et al.  Probabilistic Models for Expert Finding , 2007, ECIR.

[16]  Djoerd Hiemstra,et al.  Modeling multi-step relevance propagation for expert finding , 2008, CIKM '08.

[17]  Jie Tang,et al.  ArnetMiner: extraction and mining of academic social networks , 2008, KDD.

[18]  Luo Si,et al.  Discriminative models of integrating document evidence and document-candidate associations for expert search , 2010, SIGIR '10.

[19]  M. de Rijke,et al.  Expertise Retrieval , 2012, Found. Trends Inf. Retr..

[20]  Jiawei Han,et al.  Modeling and exploiting heterogeneous bibliographic networks for expertise ranking , 2012, JCDL '12.

[21]  Samik Datta,et al.  Capacitated team formation problem on social networks , 2012, KDD.

[22]  James D. Herbsleb,et al.  Social coding in GitHub: transparency and collaboration in an open software repository , 2012, CSCW.

[23]  James D. Herbsleb,et al.  Impression formation in online peer production: activity traces and personal profiles in github , 2013, CSCW.

[24]  Jan Bosch,et al.  Social Networking Meets Software Development: Perspectives from GitHub, MSDN, Stack Exchange, and TopCoder , 2013, IEEE Software.

[25]  David Lo,et al.  Network Structure of Social Coding in GitHub , 2013, 2013 17th European Conference on Software Maintenance and Reengineering.

[26]  Alexander Serebrenik,et al.  StackOverflow and GitHub: Associations between Software Development and Crowdsourced Knowledge , 2013, 2013 International Conference on Social Computing.

[27]  Lei Li,et al.  Understanding project dissemination on a social coding site , 2013, 2013 20th Working Conference on Reverse Engineering (WCRE).

[28]  Wilfred Ng,et al.  SocialTransfer: Transferring Social Knowledge for Cold-Start Cowdsourcing , 2014, CIKM.

[29]  Antonio Lima,et al.  Coding Together at Scale: GitHub as a Collaborative Social Network , 2014, ICWSM.

[30]  James T. White Towards README-EVAL : Interpreting README File Instructions , 2014, ACL 2014.

[31]  M. Kirby-Hirst Philostratus' Heroikos : Protesilaos, Achilles and Palamedes unite in defence of the Greek world , 2014 .

[32]  Rudolf Ferenc,et al.  Characterization of Source Code Defects by Data Mining Conducted on GitHub , 2015, ICCSA.

[33]  Gabriele Bavota,et al.  License Usage and Changes: A Large-Scale Study of Java Projects on GitHub , 2015, 2015 IEEE 23rd International Conference on Program Comprehension.

[34]  Georgios Gousios,et al.  Matching GitHub Developer Profiles to Job Advertisements , 2015, 2015 IEEE/ACM 12th Working Conference on Mining Software Repositories.

[35]  Fatima Fahimnia,et al.  Expertise Retrieval: Foundations, Methods and Models , 2016 .

[36]  Xuelong Li,et al.  Block-Row Sparse Multiview Multilabel Learning for Image Classification , 2016, IEEE Transactions on Cybernetics.

[37]  Yueting Zhuang,et al.  Expert Finding for Community-Based Question Answering via Ranking Metric Network Learning , 2016, IJCAI.

[38]  Tsuyoshi Murata,et al.  {m , 1934, ACML.