Global and Geographically Distributed Work Teams: Understanding the Bug Fixing Process and Potentially Bug-prone Activity Patterns

This dissertation has studied the roots of the errors introduced involuntary into the source code through the analysis of the history of source code management systems (SCMs). The event of introducing an error has been named as bug seeding commit, and its place in the bug life cycle has been studied from several perspectives: a) the bug life cycle has been expanded and considers not only the bug tracking system (BTS), b) the time to fix a bug or c) the estimation of the finding rate, among others. In addition, we study the potential causes of the seeding. This thesis has focused on two potential human related aspects such as experience and time of the day. As main results, the time span from a bug being introduced to its discovery represents on average 60% to a 90% of its total life. In addition, the discovery rate of bugs follows in most of the cases a lineal distribution, a result that can help in estimating further effort in the project. Regarding to human related factors, no relationship was found between experience and the ratio of buggy activity of the developers. And finally, there are specific timeslots of the day where the changes to the source code are more prone to be buggy: specifically during the evening timeframe of activity.

[1]  Gerardo Canfora,et al.  Ldiff: An enhanced line differencing tool , 2009, 2009 IEEE 31st International Conference on Software Engineering.

[2]  Jesús M. González-Barahona,et al.  Adapting the "staged model for software evolution" to free/libre/open source software , 2007, IWPSE '07.

[3]  Mark S. Ackerman,et al.  Expertise recommender: a flexible recommendation system and architecture , 2000, CSCW '00.

[4]  Ivica Crnkovic,et al.  Proceedings of the the 6th joint meeting of the European software engineering conference and the ACM SIGSOFT symposium on The foundations of software engineering , 2007, FSE 2007.

[5]  Tze-Jie Yu,et al.  An Analysis of Several Software Defect Models , 1988, IEEE Trans. Software Eng..

[6]  Tom DeMarco,et al.  Peopleware: Productive Projects and Teams , 1987 .

[7]  Roger S. Pressman,et al.  Software Engineering: A Practitioner's Approach , 1982 .

[8]  Yi Zhang,et al.  Classifying Software Changes: Clean or Buggy? , 2008, IEEE Transactions on Software Engineering.

[9]  Thomas Zimmermann,et al.  Automatic Identification of Bug-Introducing Changes , 2006, 21st IEEE/ACM International Conference on Automated Software Engineering (ASE'06).

[10]  Eugene W. Myers,et al.  AnO(ND) difference algorithm and its variations , 1986, Algorithmica.

[11]  Thomas Fritz,et al.  Does a programmer's activity indicate knowledge of code? , 2007, ESEC-FSE '07.

[12]  Daniel M. German Using software trails to reconstruct the evolution of software: Research Articles , 2004 .

[13]  Esko Ukkonen,et al.  Algorithms for Approximate String Matching , 1985, Inf. Control..

[14]  Jesús M. González-Barahona,et al.  Evolution of the core team of developers in libre software projects , 2009, 2009 6th IEEE International Working Conference on Mining Software Repositories.

[15]  Karl Fogel,et al.  Producing open source software - how to run a successful free software project , 2005 .

[16]  Martin P. Robillard,et al.  Proceedings of the 2nd International Workshop on Recommendation Systems for Software Engineering , 2010, ICSE 2010.

[17]  Lori A. Clarke,et al.  A Formal Model of Program Dependences and Its Implications for Software Testing, Debugging, and Maintenance , 1990, IEEE Trans. Software Eng..

[18]  Victor R. Basili,et al.  Software errors and complexity: an empirical investigation , 1993 .

[19]  Pll Siinksen,et al.  Control , 1999, Diabetic medicine : a journal of the British Diabetic Association.

[20]  M. E. Conway HOW DO COMMITTEES INVENT , 1967 .

[21]  Jitesh H. Panchal,et al.  Analysis of the Structure and Evolution of an Open-Source Community , 2011, J. Comput. Inf. Sci. Eng..

[22]  Harald C. Gall,et al.  Predicting the fix time of bugs , 2010, RSSE '10.

[23]  Antonio Soares de Azevedo Terceiro,et al.  An Empirical Study on the Structural Complexity Introduced by Core and Peripheral Developers in Free Software Projects , 2010, 2010 Brazilian Symposium on Software Engineering.

[24]  Daniel Izquierdo-Cortazar,et al.  FLOSSMetrics: Free/Libre/Open Source Software Metrics , 2009, 2009 13th European Conference on Software Maintenance and Reengineering.

[25]  Gerardo Canfora,et al.  Identifying Changed Source Code Lines from Version Repositories , 2007, Fourth International Workshop on Mining Software Repositories (MSR'07:ICSE Workshops 2007).

[26]  Audris Mockus,et al.  Expertise Browser: a quantitative approach to identifying expertise , 2002, Proceedings of the 24th International Conference on Software Engineering. ICSE 2002.

[27]  Stefan Koch,et al.  Effort, co‐operation and co‐ordination in an open source software project: GNOME , 2002, Inf. Syst. J..

[28]  Dennis A. Christenson,et al.  Estimating the fault content of software using the fix-on-fix model , 1996 .

[29]  Christopher Podmore Information economics and policy: In the United States edited by Michael Rubin Libraries Unlimited, Littleton, CO 1983, 340 pp , 1984 .

[30]  Jesús M. González-Barahona,et al.  Tools for the Study of the Usual Data Sources found in Libre Software Projects , 2009, Int. J. Open Source Softw. Process..

[31]  Lucas D. Panjer Predicting Eclipse Bug Lifetimes , 2007, Fourth International Workshop on Mining Software Repositories (MSR'07:ICSE Workshops 2007).

[32]  Vasile Mioc,et al.  The International Workshop , 2010 .

[33]  J. Herbsleb,et al.  Two case studies of open source software development: Apache and Mozilla , 2002, TSEM.

[34]  Per Runeson,et al.  Guidelines for conducting and reporting case study research in software engineering , 2009, Empirical Software Engineering.

[35]  Julia Eichmann,et al.  Making Software - What Really Works, and Why We Believe It , 2011, Making Software.

[36]  Andreas Zeller,et al.  When do changes induce fixes? , 2005, ACM SIGSOFT Softw. Eng. Notes.

[37]  Gail C. Murphy,et al.  Recommending Emergent Teams , 2007, Fourth International Workshop on Mining Software Repositories (MSR'07:ICSE Workshops 2007).

[38]  Jaime Spacco,et al.  SZZ revisited: verifying when changes induce fixes , 2008, DEFECTS '08.

[39]  Albert Endres,et al.  A handbook of software and systems engineering - empirical observations, laws and theories , 2003, The Fraunhofer IESE series on software engineering.

[40]  Shari Lawrence Pfleeger,et al.  Preliminary Guidelines for Empirical Research in Software Engineering , 2002, IEEE Trans. Software Eng..

[41]  Harald C. Gall,et al.  Analysing Software Repositories to Understand Software Evolution , 2008, Software Evolution.

[42]  Andreas Zeller,et al.  Mining version archives for co-changed lines , 2006, MSR '06.

[43]  Foutse Khomh,et al.  Is it a bug or an enhancement?: a text-based approach to classify change requests , 2008, CASCON '08.

[44]  Harald C. Gall,et al.  Proceedings of the 2006 international workshop on Mining software repositories , 2006, International Conference on Software Engineering.

[45]  Jesús M. González-Barahona,et al.  Contributor Turnover in Libre Software Projects , 2006, OSS.

[46]  Daniel M. Germán,et al.  Using software trails to reconstruct the evolution of software , 2004, J. Softw. Maintenance Res. Pract..

[47]  Tracy Hall,et al.  Code Bad Smells: a review of current knowledge , 2011, J. Softw. Maintenance Res. Pract..

[48]  Kevin Crowston,et al.  FLOSSmole: A Collaborative Repository for FLOSS Research Data and Analyses , 2006, Int. J. Inf. Technol. Web Eng..

[49]  N. Nagappan,et al.  Use of relative code churn measures to predict system defect density , 2005, Proceedings. 27th International Conference on Software Engineering, 2005. ICSE 2005..

[50]  Jesús M. González-Barahona,et al.  Using Software Archaeology to Measure Knowledge Loss in Software Projects Due to Developer Turnover , 2009, 2009 42nd Hawaii International Conference on System Sciences.

[51]  Shirley Dex,et al.  JR 旅客販売総合システム(マルス)における運用及び管理について , 1991 .

[52]  Tony Montgomery,et al.  Proceedings of the 14th international conference on Software engineering , 1992, International Conference on Software Engineering.

[53]  S. Crawford,et al.  Volume 1 , 2012, Journal of Diabetes Investigation.

[54]  Victor R. Basili,et al.  The influence of organizational structure on software quality , 2008, 2008 ACM/IEEE 30th International Conference on Software Engineering.

[55]  Arie van Deursen,et al.  Proceedings of the 8th Working Conference on Mining Software Repositories , 2011, ICSE 2011.

[56]  Les Hatton,et al.  Reexamining the Fault Density-Component Size Connection , 1997, IEEE Softw..

[57]  Jesús M. González-Barahona,et al.  On the reproducibility of empirical software engineering studies based on data retrieved from development repositories , 2011, Empirical Software Engineering.

[58]  Lin Tan,et al.  Do time of day and developer experience affect commit bugginess? , 2011, MSR '11.

[59]  Christof Ebert,et al.  Surviving Global Software Development , 2001, IEEE Softw..

[60]  Stephen G. Eick,et al.  Estimating software fault content before coding , 1992, International Conference on Software Engineering.

[61]  Andreas Zeller,et al.  How Long Will It Take to Fix This Bug? , 2007, Fourth International Workshop on Mining Software Repositories (MSR'07:ICSE Workshops 2007).

[62]  Daniel Gooch,et al.  Communications of the ACM , 2011, XRDS.

[63]  Philip J. Guo,et al.  Characterizing and predicting which bugs get fixed: an empirical study of Microsoft Windows , 2010, 2010 ACM/IEEE 32nd International Conference on Software Engineering.

[64]  Christine Halverson,et al.  Considering an organization's memory , 1998, CSCW '98.

[65]  Premkumar T. Devanbu,et al.  Fair and balanced?: bias in bug-fix datasets , 2009, ESEC/FSE '09.

[66]  Taghi M. Khoshgoftaar,et al.  Detection of software modules with high debug code churn in a very large legacy system , 1996, Proceedings of ISSRE '96: 7th International Symposium on Software Reliability Engineering.

[67]  Juan Julián Merelo Guervós,et al.  Beyond source code: The importance of other artifacts in software development (a case study) , 2006, J. Syst. Softw..

[68]  David S. Munro,et al.  In: Software-Practice and Experience , 2000 .

[69]  Gregorio Robles Martínez Software engineering research on libre software: data sources, methodologies and result , 2006 .

[70]  Jeffrey C. Carver,et al.  The role of replications in Empirical Software Engineering , 2008, Empirical Software Engineering.

[71]  Jesús M. González-Barahona,et al.  Are Developers Fixing Their Own Bugs?: Tracing Bug-Fixing and Bug-Seeding Committers , 2011, Int. J. Open Source Softw. Process..

[72]  Harvey P. Siy,et al.  Predicting Fault Incidence Using Software Change History , 2000, IEEE Trans. Software Eng..

[73]  Thomas Zimmermann,et al.  Predicting Bugs from History , 2008, Software Evolution.

[74]  Michel Wermelinger,et al.  Empirical Studies of Open Source Evolution , 2008, Software Evolution.

[75]  Miryung Kim,et al.  LSdiff: a program differencing tool to identify systematic structural differences , 2010, 2010 ACM/IEEE 32nd International Conference on Software Engineering.

[76]  Meir M. Lehman,et al.  On understanding laws, evolution, and conservation in the large-program life cycle , 1984, J. Syst. Softw..

[77]  Mark Harman,et al.  7 th European Conference on Software Maintenance and Reengineering , 2003 .

[78]  Giovanni Denaro,et al.  An empirical evaluation of fault-proneness models , 2002, ICSE '02.

[79]  Gregorio Robles,et al.  Replicating MSR: A study of the potential replicability of papers published in the Mining Software Repositories proceedings , 2010, 2010 7th IEEE Working Conference on Mining Software Repositories (MSR 2010).

[80]  Sunghun Kim,et al.  Toward an understanding of bug fix patterns , 2009, Empirical Software Engineering.

[81]  Fred P. Brooks,et al.  The Mythical Man-Month , 1975, Reliable Software.

[82]  David Alan Grier Working in Conference , 2013, Computer.

[83]  Paola Batistoni,et al.  International Conference , 2001 .

[84]  Audris Mockus,et al.  International Workshop on Mining Software Repositories , 2004 .

[85]  Stephen H. Kan,et al.  Metrics and Models in Software Quality Engineering , 1994, SOEN.

[86]  Brendan Murphy,et al.  Using Historical In-Process and Product Metrics for Early Estimation of Software Failures , 2006, 2006 17th International Symposium on Software Reliability Engineering.

[87]  Premkumar T. Devanbu,et al.  The missing links: bugs and bug-fix commits , 2010, FSE '10.

[88]  Peng Xu,et al.  Can distributed software development be agile? , 2006, CACM.

[89]  David Budgen,et al.  Empirical Software Engineering , 2014, Computing Handbook, 3rd ed..

[90]  Swapna S. Gokhale,et al.  A multiplicative model of software defect repair times , 2010, Empirical Software Engineering.

[91]  Audris Mockus,et al.  An Empirical Study of Speed and Communication in Globally Distributed Software Development , 2003, IEEE Trans. Software Eng..

[92]  Mario Piattini,et al.  Challenges and Improvements in Distributed Software Development: A Systematic Review , 2009, Adv. Softw. Eng..

[93]  Jono Bacon The Art of Community - Building the New Age of Participation, 2nd Edition , 2012, Theory in practice.

[94]  Padmanabhan Santhanam,et al.  Exploring defect data from development and customer usage on software modules over multiple releases , 1998, Proceedings Ninth International Symposium on Software Reliability Engineering (Cat. No.98TB100257).