Automatically Modeling Developer Programming Ability and Interest Across Software Communities

Developer profile plays an important role in software project planning, developer recommendation, personnel training, and other tasks. Modeling the ability and interest of developers is its key issue. However, most existing approaches require manual assessment, like 360∘ performance evaluation. With the emergence of social networking sites such as StackOverflow and Github, a vast amount of developer information is created on a daily basis. Such personal and social context data has huge potential to support automatic and effective developer ability evaluation and interest mining. In this paper, we propose CPDScorer, a novel approach for modeling and scoring the programming ability and interest of developers through mining heterogeneous information from both community question answering (CQA) sites and open-source software (OSS) communities. CPDScorer analyzes the questions and answers posted in CQA sites, and evaluates the projects submitted in OSS communities to assign expertise scores as well as interest scores to developers, considering both the quantitative and qualitative factors. When profiling developer's ability and interest, a programming term extraction algorithm is also designed based on set covering. We have conducted experiments on StackOverflow and Github to measure the effectiveness of CPDScorer. The results show that our approach is feasible and practical in user programming ability and interest modeling. In particular, the precision of our approach reaches 80%.

[1]  Jeffrey Pomerantz,et al.  Evaluating and predicting answer quality in community QA , 2010, SIGIR.

[2]  Atul Gupta,et al.  Discovery of technical expertise from open source code repositories , 2013, WWW.

[3]  Gail C. Murphy,et al.  Determining Implementation Expertise from Bug Reports , 2007, Fourth International Workshop on Mining Software Repositories (MSR'07:ICSE Workshops 2007).

[4]  Alexander Serebrenik,et al.  Who's who in Gnome: Using LSA to merge software repository identities , 2012, 2012 28th IEEE International Conference on Software Maintenance (ICSM).

[5]  Gilad Mishne,et al.  Finding high-quality content in social media , 2008, WSDM '08.

[6]  Makoto Takizawa,et al.  A Survey on Clustering Algorithms for Wireless Sensor Networks , 2010, 2010 13th International Conference on Network-Based Information Systems.

[7]  Mandeep K. Chawla,et al.  Implementing Source Code Metrics for Software quality analysis , 2012 .

[8]  Ameer Ahmed Abbasi,et al.  A survey on clustering algorithms for wireless sensor networks , 2007, Comput. Commun..

[9]  Andreas Birk,et al.  Managing Software Engineering Experience for Com-prehensive Reuse , 1999 .

[10]  W. Bruce Croft,et al.  A framework to predict the quality of answers with non-textual features , 2006, SIGIR.

[11]  Mark S. Ackerman,et al.  Expertise networks in online communities: structure and algorithms , 2007, WWW '07.

[12]  Ioannis Stamelos,et al.  Code quality analysis in open source software development , 2002, Inf. Syst. J..

[13]  Chun Chen,et al.  Probabilistic question recommendation for question answering communities , 2009, WWW '09.

[14]  Welf Löwe,et al.  Quantitative Evaluation of Software Quality Metrics in Open-Source Projects , 2009, 2009 International Conference on Advanced Information Networking and Applications Workshops.

[15]  Ramayya Krishnan,et al.  HYDRA: large-scale social identity linkage via heterogeneous behavior modeling , 2014, SIGMOD Conference.

[16]  Xiao Yan Zhang,et al.  Comparison of Machine Learning Algorithms for Software Project Time Prediction , 2015, MUE 2015.

[17]  Yao Lu,et al.  User interest modeling and its application for question recommendation in user-interactive question answering systems , 2012, Inf. Process. Manag..

[18]  Xiang Cheng,et al.  Incremental probabilistic latent semantic analysis for automatic question recommendation , 2008, RecSys '08.

[19]  Makoto Takizawa,et al.  A Survey on Clustering Algorithms for Wireless Sensor Networks , 2010, NBiS.

[20]  Enhong Chen,et al.  Question recommendation for user-interactive question answering systems , 2008, ICUIMC '08.

[21]  Georgios Gousios,et al.  Matching GitHub Developer Profiles to Job Advertisements , 2015, 2015 IEEE/ACM 12th Working Conference on Mining Software Repositories.

[22]  Jiangang Zhu,et al.  TBIL: A Tagging-Based Approach to Identity Linkage Across Software Communities , 2015, 2015 Asia-Pacific Software Engineering Conference (APSEC).

[23]  Adam Wierzbicki,et al.  GitHub Projects. Quality Analysis of Open-Source Software , 2014, SocInfo.