Identifying Similar Software Datasets through Fuzzy Inference System

Similar software have similar software measurements. Defect data from one software can be used to anticipate defects in a similar software. Although, not many defect datasets are made public in software engineering domain, PROMISE repository is a reasonable collection of software data. This paper presents a two step approach to identify similar software and applies the proposed technique to find similar datasets in PROMISE repository. As step 1, the approach generates associations rules for each dataset to determine dataset's behavior in terms of frequent patterns. As step 2, overlap between the association rules is calculated using Fuzzy Inference Systems (FIS). The FIS generated for the study have been expert-based as well as auto-generated. Similarity between 28 dataset pairs has been found KC2 and PC1 turned out to be most similar datasets with 86% similarity using Mamdani, 92% with Sugeno models. Results from expert-based and auto generated FIS have been comparable.

[1]  Alain Abran,et al.  A fuzzy logic based set of measures for software project similarity: validation and possible improvements , 2001, Proceedings Seventh International Software Metrics Symposium.

[2]  Mian M. Awais,et al.  Using Association Rules to Identify Similarities between Software Datasets , 2012, 2012 Eighth International Conference on the Quality of Information and Communications Technology.

[3]  Laurence Tianruo Yang,et al.  Fuzzy Logic with Engineering Applications , 1999 .

[4]  Philip S. Yu,et al.  Clustering by pattern similarity in large data sets , 2002, SIGMOD '02.

[5]  Thomas J. Ostrand,et al.  \{PROMISE\} Repository of empirical software engineering data , 2007 .

[6]  Shenghuo Zhu,et al.  Association-based similarity testing and its applications , 2003, Intell. Data Anal..

[7]  T. Ross Fuzzy Logic with Engineering Applications , 1994 .

[8]  Michio Sugeno,et al.  Industrial Applications of Fuzzy Control , 1985 .

[9]  Alain Abran,et al.  Evaluating software project similarity by using linguistic quantifier guided aggregations , 2001, Proceedings Joint 9th IFSA World Congress and 20th NAFIPS International Conference (Cat. No. 01TH8569).

[10]  Dietmar Pfahl,et al.  Proceedings of the Software process, 2008 international conference on Making globally distributed software development a success story , 2008 .

[11]  E. H. Mamdani,et al.  An Experiment in Linguistic Synthesis with a Fuzzy Logic Controller , 1999, Int. J. Man Mach. Stud..

[12]  Peter I. Cowling,et al.  Software Project Similarity Measurement Based on Fuzzy C-Means , 2008, ICSP.

[13]  Ana Regina Cavalcanti da Rocha,et al.  Analyzing the Similarity among Software Projects to Improve Software Project Monitoring Processes , 2010, 2010 Seventh International Conference on the Quality of Information and Communications Technology.

[14]  Srinivasan Parthasarathy,et al.  Exploiting Dataset Similarity for Distributed Mining , 2000, IPDPS Workshops.

[15]  Martin J. Shepperd,et al.  Estimating Software Project Effort Using Analogies , 1997, IEEE Trans. Software Eng..

[16]  Damian Dudek,et al.  Measures for Comparing Association Rule Sets , 2010, ICAISC.

[17]  E. Mizutani,et al.  Neuro-Fuzzy and Soft Computing-A Computational Approach to Learning and Machine Intelligence [Book Review] , 1997, IEEE Transactions on Automatic Control.