Generating Associated Relation between Documents

Traditional text mining techniques have weak ability to provide associated relations with rich semantics that is a foundation of the intelligent browsing of topics, discovery of semantic community and precise personalized recommendation in current Web and knowledge Grid, etc. In this paper we propose an algorithm to generate and calculate the associated relations and their strengths between documents within a domain. Each document is represented by a bag of words and their weights. We first build domain knowledge background based on the association rules at keyword level, and then we apply those association rules to generate and calculate the documents' semantic relations and their strengths at document level, which effectively shorten the semantic gap from keyword semantics to document semantics. Experimental results show that our proposed method is feasible and able to discover interesting facts within a domain.

[1]  Hai Zhuge,et al.  Automatic generation of document semantics for the e-science Knowledge Grid , 2006, J. Syst. Softw..

[2]  Howard J. Hamilton,et al.  Interestingness measures for data mining: A survey , 2006, CSUR.

[3]  Jie Liu,et al.  Semantic Link Network Builder and Intelligent Semantic Browser , 2004, Concurr. Pract. Exp..

[4]  Kuldip K. Paliwal,et al.  Intrusion detection using text processing techniques with a kernel based similarity measure , 2007, Comput. Secur..

[5]  M E J Newman,et al.  Finding and evaluating community structure in networks. , 2003, Physical review. E, Statistical, nonlinear, and soft matter physics.

[6]  Lise Getoor,et al.  Link mining: a survey , 2005, SKDD.

[7]  Songbo Tan,et al.  Large margin DragPushing strategy for centroid text categorization , 2007, Expert Syst. Appl..

[8]  Haym Hirsh,et al.  Mining Associations in Text in the Presence of Background Knowledge , 1996, KDD.

[9]  Soon-Young Huh,et al.  Automatic expert identification using a text categorization technique in knowledge management systems , 2008, Expert Syst. Appl..

[10]  Soon Myoung Chung,et al.  Multipass Algorithms for Mining Association Rules in Text Databases , 2001, Knowledge and Information Systems.

[11]  Tomasz Imielinski,et al.  Mining association rules between sets of items in large databases , 1993, SIGMOD Conference.

[12]  Rudolf Kruse,et al.  Interactive text retrieval based on document similarities , 2000 .

[13]  Xiangfeng Luo,et al.  Experimental study on the extraction and distribution of textual domain keywords , 2008, Concurr. Comput. Pract. Exp..