A Text Mining Model for Hypothesis Generation

This paper presents a tool to detect links between two topics across documents (e.g. two individuals). We interpret such a query as finding the most meaningful evidence trail across documents that connect these two topics. We propose to use link analysis techniques over the extracted features provided by Information Extraction Engine for finding new knowledge. A concept-association-graph based approach was proposed which combines text mining, information retrieval and link analysis techniques. Experimental results on the counterterrorism corpus demonstrate the effectiveness of our algorithm. Specifically, the algorithm generates ranked concept chains where the key terms representing significant relationships between topics are ranked high1.

[1]  Jon M. Kleinberg,et al.  Inferring Web communities from link topology , 1998, HYPERTEXT '98.

[2]  Don R. Swanson Verification of results that logically related noninteractive literatures are potential sources of new knowledge , 1989, JASIS.

[3]  Edward A. Fox,et al.  Connecting topics in document collections with stepping stones and pathways , 2005, CIKM '05.

[4]  Tong Zhang,et al.  Text Mining: Predictive Methods for Analyzing Unstructured Information , 2004 .

[5]  Marc Weeber,et al.  Case Report: Generating Hypotheses by Discovering Implicit Associations in the Literature: A Case Report of a Search for New Potential Therapeutic Uses for Thalidomide , 2003, J. Am. Medical Informatics Assoc..

[6]  Don R. Swanson,et al.  Complementary structures in disjoint science literatures , 1991, SIGIR '91.

[7]  Henry G. Small,et al.  Analysis of scientific literature to assist in problem solving , 1989, JASIS.

[8]  Jan P. H. van Santen,et al.  Modeling segmental duration in German text-to-speech synthesis , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[9]  Jan P. H. van Santen,et al.  Assignment of segmental duration in text-to-speech synthesis , 1994, Comput. Speech Lang..

[10]  Wei Jin,et al.  HCAMiner: Mining Concept Associations for Knowledge Discovery through Concept Chain Queries , 2007, COLING.

[11]  William R. Hersh,et al.  A survey of current work in biomedical text mining , 2005, Briefings Bioinform..

[12]  Cheng Niu,et al.  InfoXtract: A Customizable Intermediate Level Information Extraction Engine , 2003, Natural Language Engineering.

[13]  Katarina Bartkova,et al.  A model of segmental duration for speech synthesis in French , 1987, Speech Commun..

[14]  Fernando Adrian Das Neves,et al.  Stepping Stones and Pathways:Improving Retrieval by Chains of Relationships between Documents , 2004 .

[15]  Padmini Srinivasan,et al.  Text mining: Generating hypotheses from MEDLINE , 2004, J. Assoc. Inf. Sci. Technol..

[16]  Christos Faloutsos,et al.  Fast discovery of connection subgraphs , 2004, KDD.