Mining Software Repositories with iSPAROL and a Software Evolution Ontology

One of the most important decisions researchers face when analyzing the evolution of software systems is the choice of a proper data analysis/exchange format. Most existing formats have to be processed with special programs written specifically for that purpose and are not easily extendible. Most scientists, therefore, use their own data-base(s) requiring each of them to repeat the work of writing the import/export programs to their format. We present EvoOnt, a software repository data exchange format based on the Web Ontology Language (OWL). EvoOnt includes software, release, and bug-related information. Since OWL describes the semantics of the data, EvoOnt is (1) easily extendible, (2) comes with many existing tools, and (3) allows to derive assertions through its inherent Description Logic reasoning capabilities. The paper also shows iSPARQL -our SPARQL-based Semantic Web query engine containing similarity joins. Together with EvoOnt, iSPARQL can accomplish a sizable number of tasks sought in software repository mining projects, such as an assessment of the amount of change between versions or the detection of bad code smells. To illustrate the usefulness of EvoOnt (and iSPARQL), we perform a series of experiments with a real-world Java project. These show that a number of software analyses can be reduced to simple iSPARQL queries on an EvoOnt dataset.

[1]  Michele Lanza,et al.  Software bugs and evolution: a visual approach to uncover their relationship , 2006, Conference on Software Maintenance and Reengineering (CSMR'06).

[2]  Mika Mäntylä,et al.  A taxonomy and an initial empirical study of bad smells in code , 2003, International Conference on Software Maintenance, 2003. ICSM 2003. Proceedings..

[3]  Abraham Bernstein,et al.  Detecting similar Java classes using tree algorithms , 2006, MSR '06.

[4]  Gabriel Valiente,et al.  Algorithms on Trees and Graphs , 2002, Springer Berlin Heidelberg.

[5]  Sander Tichelaar FAMIX Java language plug-in 1.0 , 1999 .

[6]  Axel Korthaus,et al.  KOntoR: An Ontology-enabled Approach to Software Reuse , 2006, SEKE.

[7]  Harald C. Gall,et al.  Populating a Release History Database from version control and bug tracking systems , 2003, International Conference on Software Maintenance, 2003. ICSM 2003. Proceedings..

[8]  D. Hyland-Wood,et al.  Toward a Software Maintenance Methodology using Semantic Web Techniques , 2006, 2006 Second International IEEE Workshop on Software Evolvability (SE'06).

[9]  Vladimir I. Levenshtein,et al.  Binary codes capable of correcting deletions, insertions, and reversals , 1965 .

[10]  Jennifer Widom,et al.  Exploiting hierarchical domain structure to compute similarity , 2003, TOIS.

[11]  Audris Mockus,et al.  International Workshop on Mining Software Repositories , 2004 .

[12]  Jens Dietrich,et al.  A formal description of design patterns using OWL , 2005, 2005 Australian Software Engineering Conference.

[13]  Raed Shatnawi,et al.  An Investigation of Bad Smells in Object-Oriented Design , 2006, Third International Conference on Information Technology: New Generations (ITNG'06).

[14]  William W. Cohen Data integration using similarity joins and a word-based information representation language , 2000, TOIS.

[15]  Stéphane Ducasse,et al.  Object-Oriented Metrics in Practice , 2005 .

[16]  Pradeep Ravikumar,et al.  A Comparison of String Distance Metrics for Name-Matching Tasks , 2003, IIWeb.