Density Based Text Clustering

As the discovery of information from text corpora becomes more and more important there is a necessity to develop clustering algorithms designed for such a task. One of the most, successful approach to clustering is the density based methods. However due to the very high dimensionality of the data, these algorithms are not directly applicable. In this paper we demonstrate the need to suitably exploit the already developed feature reduction techniques, in order to maximize the clustering performance of density based methods.

[1]  Anil K. Jain,et al.  Data clustering: a review , 1999, CSUR.

[2]  Michael N. Vrahatis,et al.  The New k-Windows Algorithm for Improving the k-Means Clustering Algorithm , 2002, J. Complex..

[3]  Yiming Yang,et al.  RCV1: A New Benchmark Collection for Text Categorization Research , 2004, J. Mach. Learn. Res..

[4]  Hans-Peter Kriegel,et al.  Density-Based Clustering in Spatial Databases: The Algorithm GDBSCAN and Its Applications , 1998, Data Mining and Knowledge Discovery.

[5]  Yiming Yang,et al.  A Comparative Study on Feature Selection in Text Categorization , 1997, ICML.

[6]  Pavel Berkhin,et al.  A Survey of Clustering Data Mining Techniques , 2006, Grouping Multidimensional Data.

[7]  V. P. Plagianakos,et al.  TUMOR DETECTION IN COLONOSCOPY USING THE UNSUPERVISED k-WINDOWS CLUSTERING ALGORITHM AND NEURAL NETWORKS , 2004 .

[8]  P. C. Wong,et al.  Generalized vector spaces model in information retrieval , 1985, SIGIR '85.

[9]  Athanasios K. Tsakalidis,et al.  A computational geometry approach to Web personalization , 2004, Proceedings. IEEE International Conference on e-Commerce Technology, 2004. CEC 2004..

[10]  Hans-Peter Kriegel,et al.  OPTICS: ordering points to identify the clustering structure , 1999, SIGMOD '99.

[11]  T. M. Murali,et al.  A Monte Carlo algorithm for fast projective clustering , 2002, SIGMOD '02.

[12]  Sotiris Kotsiantis,et al.  Text Classification Using Machine Learning Techniques , 2005 .

[13]  Charu C. Aggarwal,et al.  On the Surprising Behavior of Distance Metrics in High Dimensional Spaces , 2001, ICDT.

[14]  Dunja Mladenic,et al.  Feature Selection for Unbalanced Class Distribution and Naive Bayes , 1999, ICML.

[15]  Hans-Peter Kriegel,et al.  A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise , 1996, KDD.

[16]  Dimitris K. Tasoulis,et al.  Online Neural Network Training for Automatic Ischemia Episode Detection , 2004, ICAISC.