Exploring, exposing, and exploiting emails to include human factors in software engineering

Researchers mine software repositories to support software maintenance and evolution. The analysis of the structured data, mainly source code and changes, has several benefits and offers precise results. This data, however, leaves communication in the background, and does not permit a deep investigation of the human factor, which is crucial in software engineering. Software repositories also archive documents, such as emails or comments, that are used to exchange knowledge among people - we call it "people-centric information." By covering this data, we include the human factor in our analysis, yet its unstructured nature makes it currently sub-exploited. Our work, by focusing on email communication and by implementing the necessary tools, investigates methods for exploring, exposing, and exploiting unstructured data. We believe it is possible to close the gap between development and communication, extract opinions, habits, and views of developers, and link implementation to its rationale; we see in a future where software analysis and development is routinely augmented with people-centric information.

[1]  Michele Lanza,et al.  RTFM (Read the Factual Mails) - Augmenting Program Comprehension with Remail , 2011, 2011 15th European Conference on Software Maintenance and Reengineering.

[2]  Thomas D. LaToza,et al.  Maintaining mental models: a study of developer work habits , 2006, ICSE.

[3]  Abraham Bernstein,et al.  Improving defect prediction using temporal features and non linear models , 2007, IWPSE '07.

[4]  Razvan C. Bunescu,et al.  Sentiment analyzer: extracting sentiments about a given topic using natural language processing techniques , 2003, Third IEEE International Conference on Data Mining.

[5]  Andreas Zeller,et al.  Mining version histories to guide software changes , 2005, Proceedings. 26th International Conference on Software Engineering.

[6]  Alberto Bacchelli,et al.  Extracting Source Code from E-Mails , 2010, 2010 IEEE 18th International Conference on Program Comprehension.

[7]  Witold Pedrycz,et al.  A comparative analysis of the efficiency of change metrics and static code attributes for defect prediction , 2008, 2008 ACM/IEEE 30th International Conference on Software Engineering.

[8]  Harald C. Gall,et al.  CVS release history data for detecting logical couplings , 2003, Sixth International Workshop on Principles of Software Evolution, 2003. Proceedings..

[9]  Nachiappan Nagappan,et al.  Predicting defects using network analysis on dependency graphs , 2008, 2008 ACM/IEEE 30th International Conference on Software Engineering.

[10]  Premkumar T. Devanbu,et al.  Latent social structure in open source projects , 2008, SIGSOFT '08/FSE-16.

[11]  Olga Baysal,et al.  Correlating Social Interactions to Release History during Software Evolution , 2007, Fourth International Workshop on Mining Software Repositories (MSR'07:ICSE Workshops 2007).

[12]  M. E. Conway HOW DO COMMITTEES INVENT , 1967 .

[13]  Tom DeMarco,et al.  Peopleware: Productive Projects and Teams , 1987 .

[14]  Karl Fogel,et al.  Producing open source software - how to run a successful free software project , 2005 .

[15]  Tudor Gîrba,et al.  Modeling History to Understand Software Evolution , 2005 .

[16]  Victor R. Basili,et al.  A Validation of Object-Oriented Design Metrics as Quality Indicators , 1996, IEEE Trans. Software Eng..

[17]  David M. Pennock,et al.  Mining the peanut gallery: opinion extraction and semantic classification of product reviews , 2003, WWW '03.

[18]  Jr. Frederick P. Brooks,et al.  The mythical man-month (anniversary ed.) , 1995 .

[19]  Ahmed E. Hassan,et al.  Studying the use of developer IRC meetings in open source projects , 2009, 2009 IEEE International Conference on Software Maintenance.

[20]  Michael Gertz,et al.  Mining email social networks , 2006, MSR '06.

[21]  Andreas Zeller,et al.  Mining metrics to predict component failures , 2006, ICSE.

[22]  Michael Gertz,et al.  Mining email social networks in Postgres , 2006, MSR '06.

[23]  Stéphane Ducasse,et al.  Object-Oriented Metrics in Practice , 2005 .

[24]  Ying Zou,et al.  Techniques for Identifying the Country Origin of Mailing List Participants , 2009, 2009 16th Working Conference on Reverse Engineering.

[25]  Fred P. Brooks,et al.  The Mythical Man-Month , 1975, Reliable Software.

[26]  Premkumar T. Devanbu,et al.  Talk and work: a preliminary report , 2008, MSR '08.

[27]  Ramanath Subramanyam,et al.  Empirical Analysis of CK Metrics for Object-Oriented Design Complexity: Implications for Software Defects , 2003, IEEE Trans. Software Eng..

[28]  Alberto Bacchelli,et al.  Miler: a toolset for exploring email data , 2011, 2011 33rd International Conference on Software Engineering (ICSE).

[29]  Mircea Lungu,et al.  Reverse engineering software ecosystems , 2009 .

[30]  Kwan-Liu Ma,et al.  Visualizing social interaction in open source software projects , 2007, 2007 6th International Asia-Pacific Symposium on Visualization.

[31]  Andreas Zeller,et al.  Predicting faults from cached history , 2008, ISEC '08.

[32]  Radu Marinescu,et al.  Detection strategies: metrics-based rules for detecting design flaws , 2004, 20th IEEE International Conference on Software Maintenance, 2004. Proceedings..

[33]  Harvey Siy,et al.  If your ver-sion control system could talk , 1997 .

[34]  Tom DeMarco,et al.  Peopleware (2nd ed.): productive projects and teams , 1999 .

[35]  Andreas Zeller,et al.  Yesterday, my program worked. Today, it does not. Why? , 1999, ESEC/FSE-7.

[36]  Alberto Bacchelli,et al.  On the Impact of Design Flaws on Software Defects , 2010, 2010 10th International Conference on Quality Software.

[37]  Michele Lanza,et al.  3 Analysing Software Repositories to Understand Software Evolution , 2008 .

[38]  Alberto Bacchelli,et al.  Are Popular Classes More Defect Prone? , 2010, FASE.

[39]  Elaine J. Weyuker,et al.  Predicting the location and number of faults in large software systems , 2005, IEEE Transactions on Software Engineering.

[40]  Rino Falcone,et al.  Island Parsing and Bidirectional Charts , 1988, COLING.

[41]  Romain Robbes,et al.  Linking e-mails and source code artifacts , 2010, 2010 ACM/IEEE 32nd International Conference on Software Engineering.

[42]  Alberto Bacchelli,et al.  Miler - A Tool Infrastructure to Analyze Mailing Lists , 2009, FAMOOSr@WCRE.

[43]  Michele Lanza,et al.  Software bugs and evolution: a visual approach to uncover their relationship , 2006, Conference on Software Maintenance and Reengineering (CSMR'06).

[44]  Tim Menzies,et al.  Text is Software Too , 2004, MSR.

[45]  Richard C. Holt,et al.  The top ten list: dynamic fault prediction , 2005, 21st IEEE International Conference on Software Maintenance (ICSM'05).

[46]  Stéphane Ducasse,et al.  Using history information to improve design flaws detection , 2004, Eighth European Conference on Software Maintenance and Reengineering, 2004. CSMR 2004. Proceedings..