A New Label Maximization Based Incremental Neural Clustering Approach: Application to Text Clustering

Neural clustering algorithms show high performance in the general context of the analysis of homogeneous textual dataset. This is especially true for the recent adaptive versions of these algorithms, like the incremental growing neural gas algorithm (IGNG) and the label maximization based incremental growing neural gas algorithm (IGNG-F). In this paper we highlight that there is a drastic decrease of performance of these algorithms, as well as the one of more classical algorithms, when a heterogeneous textual dataset is considered as an input. Specific quality measures and cluster labeling techniques that are independent of the clustering method are used for the precise performance evaluation. We provide variations to incremental growing neural gas algorithm exploiting in an incremental way knowledge from clusters about their current labeling along with cluster distance measure data. This solution leads to significant gain in performance for all types of datasets, especially for the clustering of complex heterogeneous textual data.

[1]  A. Ennaji,et al.  An incremental growing neural gas learns topologies , 2005, Proceedings. 2005 IEEE International Joint Conference on Neural Networks, 2005..

[2]  Jean-Charles Lamirel,et al.  Unsupervised recall and precision measures: a step towards new efficient clustering quality indexes , 2010 .

[3]  Donald W. Bouldin,et al.  A Cluster Separation Measure , 1979, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[4]  Jean-Charles Lamirel,et al.  A New Incremental Growing Neural Gas Algorithm Based on Clusters Labeling Maximization: Application to Clustering of Heterogeneous Textual Data , 2010, IEA/AIE.

[5]  Nicolás García-Pedrajas,et al.  Trends in Applied Intelligent Systems - 23rd International Conference on Industrial Engineering and Other Applications of Applied Intelligent Systems, IEA/AIE 2010, Cordoba, Spain, June 1-4, 2010, Proceedings, Part I , 2010, IEA/AIE.

[6]  Teuvo Kohonen,et al.  Self-organized formation of topologically correct feature maps , 2004, Biological Cybernetics.

[7]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[8]  Jean-Charles Lamirel,et al.  Novel labeling strategies for hierarchical representation of multidimensional data analysis results , 2008 .

[9]  Jean-Charles Lamirel,et al.  New classification quality estimators for analysis of documentary information: Application to patent analysis and web mapping , 2004, Scientometrics.

[10]  Bernd Fritzke,et al.  A Growing Neural Gas Network Learns Topologies , 1994, NIPS.

[11]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .