Expertise identification and visualization from CVS

As software evolves over time, the identification of expertise becomes an important problem. Component ownership and team awareness of such ownership are signals of solid project. Ownership and ownership awareness are also issues in open-source software (OSS) projects. Indeed, the membership in OSS projects is dynamic with team members arriving and leaving. In large open source projects, specialists who know the system very well are considered experts. How can one identify the experts in a project by mining a particular repository like the source code? Have they gotten help from other people? We provide an approach using classification of the source code tree as a path to derive the expertise of the committers. Because committers may get help from other people, we also retrieve their contributors. We also provide a visualization that helps to further explore the repository via committers and categories. We present a prototype implementation that describes our research using the Apache HTTP Web server project as a case study.

[1]  J. Herbsleb,et al.  Two case studies of open source software development: Apache and Mozilla , 2002, TSEM.

[2]  Dirk Riehle,et al.  Enterprise People and Skill Discovery Using Tolerant Retrieval and Visualization , 2007, ECIR.

[3]  Peter A. Gloor,et al.  Correlating temporal communication patterns of the Eclipse open source community with performance and creativity , 2007, Comput. Math. Organ. Theory.

[4]  Michael Gertz,et al.  Database Techniques for the Analysis and Exploration of Software Repositories , 2004, MSR.

[5]  Kevin Crowston,et al.  The social structure of free and open source software development , 2005, First Monday.

[6]  Jane Greenberg,et al.  Who is an open source software developer? , 2002, CACM.

[7]  Jesús M. González-Barahona,et al.  Applying Social Network Analysis to the Information in CVS Repositories , 2004, MSR.

[8]  Shih-Kun Huang,et al.  Mining version histories to verify the learning process of Legitimate Peripheral Participants , 2005, ACM SIGSOFT Softw. Eng. Notes.

[9]  Peter Jackson,et al.  Natural language processing for online applications : text retrieval, extraction and categorization , 2002 .

[10]  Harald C. Gall,et al.  Fractal Figures: Visualizing Development Effort for CVS Entities , 2005, 3rd IEEE International Workshop on Visualizing Software for Understanding and Analysis.

[11]  Shih-Kun Huang,et al.  Mining version histories to verify the learning process of Legitimate Peripheral Participants , 2005, MSR '05.

[12]  D HerbslebJames,et al.  Two case studies of open source software development , 2002 .

[13]  Lucian Voinea,et al.  CVSgrab: Mining the History of Large Software Projects , 2006, EuroVis.

[14]  Stéphane Ducasse,et al.  How developers drive software evolution , 2005, Eighth International Workshop on Principles of Software Evolution (IWPSE'05).

[15]  Ryen W. White,et al.  Supporting exploratory search , 2006 .

[16]  Roy T. Fielding,et al.  Shared leadership in the Apache project , 1999, CACM.

[17]  Eric Gilbert,et al.  LifeSource: two CVS visualizations , 2006, CHI EA '06.