Project Replayer with Email Analysis - Revealing Contexts in Software Development

In many software development projects, people tend to repeat same mistakes due to lack of shared knowledge from past experiences. Generally, it is very difficult to manually find out valuable phenomena from huge data. Invisible context, which cannot be known directly from software documents or formal reports, is an important factor to these difficulties. We propose a new method to find contexts based on analysis to email archives in a project repository. In this method, we first apply natural language processing to extract keywords from email messages. Next, similarities among the messages are calculated based on the extracted keywords, and the messages are classified into clusters according to the similarities. The clustering result can be presented with other information such as code growth graph or schedule charts. This method is implemented as an extension to the Project Replayer, a tool to review past project data. Pilot analysis confirms that a researcher could grasp important contexts of failures in actual projects using the Project Replayer.

[1]  Walt Scacchi,et al.  Multimodal Modeling, Analysis, and Validation of Open Source Software Development Processes , 2006, Int. J. Inf. Technol. Web Eng..

[2]  Kathleen M. Carley,et al.  A Social Network Approach to Free/Open Source Software Simulation , 2005 .

[3]  Janice Singer,et al.  Hipikat: a project memory for software development , 2005, IEEE Transactions on Software Engineering.

[4]  Michael Gertz,et al.  Mining email social networks , 2006, MSR '06.

[5]  Jesús M. González-Barahona,et al.  Developer identification methods for integrated data from various sources , 2005, ACM SIGSOFT Softw. Eng. Notes.

[6]  RoblesGregorio,et al.  Developer identification methods for integrated data from various sources , 2005 .

[7]  Katsuro Inoue,et al.  Empirical Project Monitor: A System for Managing Software Development Projects in Real Time , 2004 .

[8]  Gerard Salton,et al.  A vector space model for automatic indexing , 1975, CACM.

[9]  W. Scacchi,et al.  Multi-Modal Modeling , Analysis and Validation of Open Source Software Requirements Processes , 2004 .

[10]  Gerard Salton,et al.  Term-Weighting Approaches in Automatic Text Retrieval , 1988, Inf. Process. Manag..

[11]  Hajimu Iida,et al.  Project Replayer - An Investigation Tool to Revisit Processes of Past Projects , 2006, SPW/ProSim.

[12]  Kevin Crowston,et al.  Collaboration using OSSmole: a repository of FLOSS data and analyses , 2005, MSR '05.

[13]  Katsuro Inoue,et al.  CoxR: open source development history search system , 2005, 12th Asia-Pacific Software Engineering Conference (APSEC'05).