On the Use of Process Trails to Understand Software Development

Software repositories, such as version control systems (CVS) and bug-tracking systems (Bugzilla), provide useful information about software process trails left by developers during the evolution of a software project. Mining these repositories provides a way to understand software development, to support predictions about software development, and to plan various aspects of software projects. We introduce three cases in the areas of impact analysis, change request assignment, and crosscutting concern mining, that takes benefit from historical information and show that the combination of different type of analyses can improve the performance of these software engineering models

[1]  Jens Krinke,et al.  Identifying similar code with program dependence graphs , 2001, Proceedings Eighth Working Conference on Reverse Engineering.

[2]  Richard C. Holt,et al.  Evolution Spectrographs: visualizing punctuated change in software evolution , 2004 .

[3]  Will Venters,et al.  Software engineering: theory and practice , 2006 .

[4]  Ran Ettinger,et al.  Untangling: a slice extraction refactoring , 2004, AOSD '04.

[5]  Harald C. Gall,et al.  MSR 2006: the 3rd international workshop on mining software repositories , 2006, ICSE '06.

[6]  Audris Mockus,et al.  Inferring change effort from configuration management databases , 1998, Proceedings Fifth International Software Metrics Symposium. Metrics (Cat. No.98TB100262).

[7]  Michael D. Ernst Static and dynamic analysis: synergy and duality , 2003 .

[8]  Kevin Crowston,et al.  Collaboration using OSSmole: a repository of FLOSS data and analyses , 2005, MSR '05.

[9]  Gerardo Canfora,et al.  How Crosscutting Concerns Evolve in JHotDraw , 2005, 13th IEEE International Workshop on Software Technology and Engineering Practice (STEP'05).

[10]  William G. Griswold,et al.  AspectBrowser: Tool Support for Managing Dispersed Aspects , 1999 .

[11]  Alfred J. Lotka,et al.  The frequency distribution of scientific productivity , 1926 .

[12]  G. Zipf,et al.  Relative Frequency as a Determinant of Phonetic Change , 1930 .

[13]  Mariam Kamkar,et al.  An overview and comparative classification of program slicing techniques , 1995, J. Syst. Softw..

[14]  Mikael Lindvall,et al.  Practical implications of traceability , 1996 .

[15]  Harald C. Gall,et al.  Mining evolution data of a product family , 2005, MSR '05.

[16]  Harvey P. Siy,et al.  Predicting Fault Incidence Using Software Change History , 2000, IEEE Trans. Software Eng..

[17]  Ettore Merlo,et al.  Experiment on the automatic detection of function clones in a software system using metrics , 1996, 1996 Proceedings of International Conference on Software Maintenance.

[18]  J. Herbsleb,et al.  Two case studies of open source software development: Apache and Mozilla , 2002, TSEM.

[19]  Jeffrey S. Foster,et al.  Understanding source code evolution using abstract syntax tree matching , 2005, MSR.

[20]  Hans Peter Luhn,et al.  The Automatic Creation of Literature Abstracts , 1958, IBM J. Res. Dev..

[21]  Ralph Johnson,et al.  design patterns elements of reusable object oriented software , 2019 .

[22]  Andreas Zeller,et al.  Mining Version Histories to Guide Software Changes , 2004 .

[23]  Qing Zhang,et al.  CVSSearch: searching through source code using CVS comments , 2001, Proceedings IEEE International Conference on Software Maintenance. ICSM 2001.

[24]  Audris Mockus,et al.  Does Code Decay? Assessing the Evidence from Change Management Data , 2001, IEEE Trans. Software Eng..

[25]  Jonathan I. Maletic,et al.  Towards a taxonomy of approaches for mining of source code repositories , 2005, MSR '05.

[26]  Thomas Ball,et al.  The concept of dynamic analysis , 1999, ESEC/FSE-7.

[27]  Tomoji Kishi,et al.  Implementing Design Patterns Using Advanced Separation of Concerns , 2001 .

[28]  Gregor Kiczales,et al.  Aspect-oriented programming , 2001, ESEC/FSE-9.

[29]  Ken-ichi Matsumoto,et al.  Accelerating cross-project knowledge collaboration using collaborative filtering and social networks , 2005, MSR.

[30]  Jesús M. González-Barahona,et al.  Applying Social Network Analysis to the Information in CVS Repositories , 2004, MSR.

[31]  Arie van Deursen,et al.  Identifying aspects using fan-in analysis , 2004, 11th Working Conference on Reverse Engineering.

[32]  Arie van Deursen,et al.  On the use of clone detection for identifying crosscutting concern code , 2005, IEEE Transactions on Software Engineering.

[33]  Brenda S. Baker,et al.  On finding duplication and near-duplication in large software systems , 1995, Proceedings of 2nd Working Conference on Reverse Engineering.

[34]  A. Bookstein Informetric distributions, part I: Unified overview , 1990 .

[35]  Moshe Bar,et al.  Open Source Development with CVS , 1999 .

[36]  Gail C. Murphy,et al.  Who should fix this bug? , 2006, ICSE.

[37]  Rodolfo Alfredo Bertone,et al.  Software engineering: Theory and practice, 2nd Edition. Shari Lawrence Pfleeger. Prentice Hall, 2001 , 2005 .

[38]  Gail C. Murphy,et al.  Predicting source code changes by mining change history , 2004, IEEE Transactions on Software Engineering.

[39]  Richard C. Holt,et al.  The top ten list: dynamic fault prediction , 2005, 21st IEEE International Conference on Software Maintenance (ICSM'05).

[40]  Edmund M. Clarke,et al.  Model checking and abstraction , 1994, TOPL.

[41]  Ji Wu,et al.  Application of Maximum Entropy Principle to Software Failure Prediction , 2004, COMPSAC.

[42]  N. Loughran,et al.  Mining Aspects , 2002 .

[43]  J. Howard Johnson,et al.  Identifying redundancy in source code using fingerprints , 1993, CASCON.

[44]  William G. Griswold,et al.  Exploiting the map metaphor in a tool for software evolution , 2001, Proceedings of the 23rd International Conference on Software Engineering. ICSE 2001.

[45]  Mark Weiser,et al.  Program Slicing , 1981, IEEE Transactions on Software Engineering.

[46]  Daniel Jackson,et al.  Software analysis: a roadmap , 2000, ICSE '00.

[47]  Chadd C. Williams,et al.  Recovering system specific rules from software repositories , 2005, MSR '05.

[48]  Carsten Görg,et al.  Error detection by refactoring reconstruction , 2005, MSR '05.

[49]  Kevin Crowston,et al.  The Perils and Pitfalls of Mining SourceForge , 2004, MSR.

[50]  Audris Mockus,et al.  Expertise Browser: a quantitative approach to identifying expertise , 2002, Proceedings of the 24th International Conference on Software Engineering. ICSE 2002.

[51]  Audris Mockus,et al.  Understanding and predicting effort in software projects , 2003, 25th International Conference on Software Engineering, 2003. Proceedings..

[52]  Mariano Ceccato,et al.  A qualitative comparison of three aspect mining techniques , 2005, 13th International Workshop on Program Comprehension (IWPC'05).

[53]  Abraham Bookstein,et al.  Implications of ambiguity for scientometric measurement , 2001, J. Assoc. Inf. Sci. Technol..

[54]  Michael W. Godfrey,et al.  Secrets from the Monster: Extracting Mozilla’s Software Architecture , 2000 .

[55]  Amir Michail,et al.  Data mining library reuse patterns using generalized association rules , 2000, Proceedings of the 2000 International Conference on Software Engineering. ICSE 2000 the New Millennium.

[56]  Giuliano Antoniol,et al.  Recovering Traceability Links between Code and Documentation , 2002, IEEE Trans. Software Eng..

[57]  John Howard,et al.  Hybrid slicing: integrating dynamic information with static analysis , 1997, TSEM.

[58]  Shawn A. Bohner,et al.  Impact analysis-Towards a framework for comparison , 1993, 1993 Conference on Software Maintenance.

[59]  M. F. Porter,et al.  An algorithm for suffix stripping , 1997 .

[60]  Mikael Lindvall,et al.  How well do experienced software developers predict software change? , 1998, J. Syst. Softw..

[61]  Andrian Marcus,et al.  An information retrieval approach to concept location in source code , 2004, 11th Working Conference on Reverse Engineering.

[62]  Jesús M. González-Barahona,et al.  GluTheos: Automating the Retrieval and Analysis of Data from Publicly Available Software Repositories , 2004, MSR.

[63]  Thomas J. McCabe,et al.  A Complexity Measure (Abstract) , 1976, ICSE.

[64]  Jane Greenberg,et al.  Open Source Software Development and Lotka's Law: Bibliometric Patterns in Programming , 2003, J. Assoc. Inf. Sci. Technol..

[65]  Daniel M. Germán,et al.  SCQL , 2005, MSR.

[66]  Daniel German,et al.  Mining CVS repositories, the softChange experience , 2004, MSR.

[67]  Marius Marin Refactoring JHOTDRAW’s Undo concern to ASPECTJ , 2005 .

[68]  Susan L. Graham,et al.  Gprof: A call graph execution profiler , 1982, SIGPLAN '82.

[69]  Jesús M. González-Barahona,et al.  Developer identification methods for integrated data from various sources , 2005, ACM SIGSOFT Softw. Eng. Notes.

[70]  Harald C. Gall,et al.  Populating a Release History Database from version control and bug tracking systems , 2003, International Conference on Software Maintenance, 2003. ICSM 2003. Proceedings..

[71]  Gail C. Murphy,et al.  Automatic bug triage using text categorization , 2004, SEKE.

[72]  Hans-Arno Jacobsen,et al.  PRISM is research in aSpect mining , 2004, OOPSLA '04.

[73]  Fabio Crestani,et al.  “Is this document relevant?…probably”: a survey of probabilistic models in information retrieval , 1998, CSUR.

[74]  Jens Krinke,et al.  Aspect mining using event traces , 2004 .

[75]  Cristina V. Lopes,et al.  Aspect-oriented programming , 1999, ECOOP Workshops.

[76]  Thomas Zimmermann,et al.  Preprocessing CVS Data for Fine-Grained Analysis , 2004, MSR.

[77]  Mariano Ceccato,et al.  Aspect mining through the formal concept analysis of execution traces , 2004, 11th Working Conference on Reverse Engineering.

[78]  Harald C. Gall,et al.  CVS release history data for detecting logical couplings , 2003, Sixth International Workshop on Principles of Software Evolution, 2003. Proceedings..

[79]  Gregor Kiczales,et al.  Overcoming the Prevalent Decomposition in Legacy Code , 2001 .

[80]  Annie T. T. Ying,et al.  Predicting source code changes by mining revision history , 2003 .

[81]  Michael Fagan Design and Code Inspections to Reduce Errors in Program Development , 1976, IBM Syst. J..

[82]  Kim Mens,et al.  Mining aspectual views using formal concept analysis , 2004 .

[83]  Wu Ji,et al.  Application of maximum entropy principle to software failure prediction , 2004, Proceedings of the 28th Annual International Computer Software and Applications Conference, 2004. COMPSAC 2004..

[84]  William G. Griswold,et al.  Design Recommendations for Concern Elaboration Tools , 2005 .

[85]  Harald C. Gall,et al.  Visualizing feature evolution of large-scale software based on problem and modification report data: Research Articles , 2004 .

[86]  Joseph Robert Horgan,et al.  Dynamic program slicing , 1990, PLDI '90.

[87]  Andreas Zeller,et al.  When do changes induce fixes? , 2005, ACM SIGSOFT Softw. Eng. Notes.

[88]  Richard C. Holt,et al.  Predicting change propagation in software systems , 2004, 20th IEEE International Conference on Software Maintenance, 2004. Proceedings..

[89]  Miryung Kim,et al.  Using a clone genealogy extractor for understanding and supporting evolution of code clones , 2005, MSR.

[90]  Richard C. Holt,et al.  Using development history sticky notes to understand software architecture , 2004, Proceedings. 12th IEEE International Workshop on Program Comprehension, 2004..

[91]  Andreas Zeller,et al.  How history justifies system architecture (or not) , 2003, Sixth International Workshop on Principles of Software Evolution, 2003. Proceedings..

[92]  Giuliano Antoniol,et al.  Linear predictive coding and cepstrum coefficients for mining time variant information from software repositories , 2005, MSR.

[93]  Eugene W. Myers,et al.  A file comparison program , 1985, Softw. Pract. Exp..

[94]  Daniel M. Germán,et al.  A framework for describing and understanding mining tools in software development , 2005, MSR.

[95]  Shinji Kusumoto,et al.  CCFinder: A Multilinguistic Token-Based Code Clone Detection System for Large Scale Source Code , 2002, IEEE Trans. Software Eng..

[96]  George Kingsley Zipf,et al.  Relative Frequency as a Determinant of Phonetic Change , 1930 .

[97]  Stephen E. Robertson,et al.  A probabilistic model of information retrieval: development and comparative experiments - Part 1 , 2000, Inf. Process. Manag..

[98]  M. Stone,et al.  Cross‐Validatory Choice and Assessment of Statistical Predictions , 1976 .

[99]  Berthold Daum,et al.  Professional Eclipse 3 for Java Developers , 2004 .

[100]  Richard C. Holt,et al.  Studying the chaos of code development , 2003, 10th Working Conference on Reverse Engineering, 2003. WCRE 2003. Proceedings..

[101]  Harald C. Gall,et al.  Journal of Software Maintenance and Evolution: Research and Practice Visualizing Feature Evolution of Large-scale Software Based on Problem and Modification Report Data , 2022 .

[102]  Audris Mockus,et al.  International Workshop on Mining Software Repositories , 2004 .

[103]  Richard C. Holt,et al.  The chaos of software development , 2003, Sixth International Workshop on Principles of Software Evolution, 2003. Proceedings..