Keyword selection method for characterizing text document maps

Characterization of subsets of data is a recurring problem in data mining. We propose a keyword selection method that can be used for obtaining characterizations of clusters of data whenever textual descriptions can be associated with the data. Several methods that cluster data sets or form projections of data provide an order or distance measure of the clusters. If such an ordering of the clusters exists or can be deduced, the method utilizes the order to improve the characterizations. The proposed method may be applied, for example, to characterizing graphical displays of collections of data ordered (e.g. with SOM algorithm). The method is validated using a collection of 10000 scientific abstracts from the INSPEC database organized on a WEBSOM document map.