Concept analysis and web clustering using combinatorial topology

The collection of the concepts that are discussed in a document set can be represented by a geometric structure, called simplical complex, of combinatorial topology. A simplex is a high-frequency keyword set that co-occurs closely which, we believe, carries a concept in the document set. The collection of all these simplexes that forms the simplical complex represents the structure of these concepts. Based on the topological structure of this complex, the documents are clustered. Several clustering schemes are presented. Our initial experiments, as expected, do support the theory

[1]  R. Ho Algebraic Topology , 2022 .

[2]  Tsau Young Lin,et al.  Granular computing II: Infrastructures for AI-Engineering , 2006, 2006 IEEE International Conference on Granular Computing.

[3]  Rakesh Agarwal,et al.  Fast Algorithms for Mining Association Rules , 1994, VLDB 1994.

[4]  Tsau Young Lin,et al.  Granular computing: examples, intuitions and modeling , 2005, 2005 IEEE International Conference on Granular Computing.

[5]  Michael McGill,et al.  Introduction to Modern Information Retrieval , 1983 .