论文信息 - Feature diversity in cluster ensembles for robust document clustering

Feature diversity in cluster ensembles for robust document clustering

The performance of document clustering systems depends on employing optimal text representations, which are not only difficult to determine beforehand, but also may vary from one clustering problem to another. As a first step towards building robust document clusterers, a strategy based on feature diversity and cluster ensembles is presented in this work. Experiments conducted on a binary clustering problem show that our method is robust to near-optimal model order selection and able to detect constructive interactions between different document representations in the test bed.

Joan Claudi Socoró | Xavier Sevillano | Francesc Alías | Germán Cobo

[1] H. Sebastian Seung,et al. Learning the parts of objects by non-negative matrix factorization , 1999, Nature.

[2] Fabrizio Sebastiani,et al. Machine learning in automated text categorization , 2001, CSUR.

[3] Anil K. Jain,et al. Data clustering: a review , 1999, CSUR.

[4] Joydeep Ghosh,et al. Relationship-based clustering and cluster ensembles for high-dimensional data mining , 2002 .

[5] S. H. Srinivasan. Features for Unsupervised Document Classification , 2002, CoNLL.

[6] Paul A. Viola,et al. Restructuring Sparse High Dimensional Data for Effective Retrieval , 1998, NIPS.

[7] Richard A. Harshman,et al. Indexing by Latent Semantic Analysis , 1990, J. Am. Soc. Inf. Sci..