Feature Location in a Collection of Product Variants: Combining Information Retrieval and Hierarchical Clustering

Locating source code elements relevant to a given fea- ture is an important step in the process of re-engineering software variants, developed by an ad-hoc reuse technique, into a Software Product Line (SPL) for systematic reuse. Existing works on using Information Retrieval (IR) tech- niques do not consider the abstraction gap between feature and source code levels. In our recent work, we have im- proved the effectiveness of IR-based feature location by in- troducing an intermediate level between feature and source code levels, called “code-topics”. We used Formal Con- cept Analysis (FCA) to identify such “code-topics”. In this paper, we investigate the results of using Agglomerative Hierarchical Clustering (AHC) algorithm to identify code- topics. In our experimental evaluation, we show that AHC significantly increases the recall of feature location with a minor decrease of precision compared to FCA.