Conceptual Clustering of Text Clusters

Common clustering techniques have the disadvantage that they do not provide intensional descriptions of the clusters obtained. Conceptual Clustering techniques, on the other hand, provide such descriptions, but are known to be rather slow. In this paper, we discuss a way of combining both techniques. We first cluster the documents by a variant of k–Means, using a thesaurus as background knowledge. This clustering reduces the large number of documents to a relatively small number of clusters, which can then be clustered conceptually in the second step.

[1]  Luis Alfonso Ureña López,et al.  Integrating Linguistic Resources in TC through WSD , 2001, Comput. Humanit..

[2]  Frank Vogt,et al.  TOSCANA - a Graphical Tool for Analyzing and Exploring Data , 1994, GD.

[3]  Manuel de Buenaga Rodríguez,et al.  Using WordNet to Complement Training Information in Text Categorization , 1997, ArXiv.

[4]  Rudolf Wille,et al.  Conceptual Clustering via Convex-Ordinal Structures , 1993 .

[5]  Bernhard Ganter,et al.  Formal Concept Analysis: Mathematical Foundations , 1998 .

[6]  Gerd Stumme,et al.  CEM - A Conceptual Email Manager , 2000, ICCS.

[7]  Gerard Salton,et al.  Automatic Text Processing: The Transformation, Analysis, and Retrieval of Information by Computer , 1989 .

[8]  George Karypis,et al.  A Comparison of Document Clustering Techniques , 2000 .

[9]  Julio Gonzalo,et al.  Indexing with WordNet synsets can improve text retrieval , 1998, WordNet@ACL/COLING.

[10]  StummeGerd,et al.  Computing iceberg concept lattices with TITANIC , 2002 .

[11]  Guy W. Mineau,et al.  Automatic Structuring of Knowledge Bases by Conceptual Clustering , 1995, IEEE Trans. Knowl. Data Eng..

[12]  R. Michalski,et al.  Learning from Observation: Conceptual Clustering , 1983 .

[13]  George Karypis,et al.  Fast supervised dimensionality reduction algorithm with applications to document categorization & retrieval , 2000, CIKM '00.

[14]  Claudio Carpineto,et al.  GALOIS: An Order-Theoretic Approach to Conceptual Clustering , 1993, ICML.

[15]  Patrick Pantel,et al.  Document clustering with committees , 2002, SIGIR '02.

[16]  Bernhard Ganter,et al.  Conceptual Structures: Logical, Linguistic, and Computational Issues , 2000, Lecture Notes in Computer Science.

[17]  Martin F. Porter,et al.  An algorithm for suffix stripping , 1997, Program.