The Impact of Directionality in Predications on Text Mining

The number of publications in biomedicine is increasing enormously each year. To help researchers digest the information in these documents, text mining tools are being developed that present co-occurrence relations between concepts. Statistical measures are used to mine interesting subsets of relations. We demonstrate how directionality of these relations affects interestingness. Support and confidence, simple data mining statistics, are used as proxies for interestingness metrics. We first built a test bed of 126,404 directional relations extracted from biomedical abstracts, which we represent as graphs containing a central starting concept and 2 rings of associated relations. We manipulated directionality in four ways and randomly selected 100 starting concepts as a test sample for each graph type. Finally, we calculated the number of relations and their support and confidence. Variation in directionality significantly affected the number of relations as well as the support and confidence of the four graph types.

[1]  T. Jenssen,et al.  A literature network of human genes for high-throughput analysis of gene expression , 2001, Nature Genetics.

[2]  M. Palakal,et al.  A comparative study of cells in inflammation, EAE and MS using biomedical literature data mining. , 2007, Journal of biomedical science.

[3]  Howard J. Hamilton,et al.  Interestingness measures for data mining: A survey , 2006, CSUR.

[4]  Hsinchun Chen,et al.  A shallow parser based on closed-class words to capture relations in biomedical text , 2003, J. Biomed. Informatics.

[5]  Michael D. Gordon,et al.  Toward Discovery Support Systems: A Replication, Re-Examination, and Extension of Swanson's Work on Literature-Based Discovery of a Connection between Raynaud's and Fish Oil , 1996, J. Am. Soc. Inf. Sci..

[6]  Marc Weeber,et al.  Text-based discovery in biomedicine: the architecture of the DAD-system , 2000, AMIA.

[7]  Alan R. Aronson,et al.  Effective mapping of biomedical text to the UMLS Metathesaurus: the MetaMap program , 2001, AMIA.

[8]  Marcelo Fiszman,et al.  The interaction of domain knowledge and linguistic structure in natural language processing: interpreting hypernymic propositions in biomedical text , 2003, J. Biomed. Informatics.

[9]  Halil Kilicoglu,et al.  Abstraction Summarization for Managing the Biomedical Research Literature , 2004, HLT-NAACL 2004.

[10]  Snehasis Mukhopadhyay,et al.  Generating association graphs of non-cooccurring text objects using transitive methods , 2005, SAC '05.

[11]  Weiguo Fan,et al.  Tapping the power of text mining , 2006, CACM.

[12]  Joydeep Ghosh,et al.  Evaluating the novelty of text-mined rules using lexical knowledge , 2001, KDD '01.

[13]  Y T Yen,et al.  Developing an NLP and IR-based algorithm for analyzing gene-disease relationships. , 2006, Methods of information in medicine.

[14]  Carol Friedman,et al.  PhenoGO: Assigning Phenotypic Context to Gene Ontology Annotations with Natural Language Processing , 2005, Pacific Symposium on Biocomputing.

[15]  D. Swanson Fish Oil, Raynaud's Syndrome, and Undiscovered Public Knowledge , 2015, Perspectives in biology and medicine.

[16]  T. Jenssen,et al.  A literature network of human genes for high-throughput analysis of gene expression , 2001 .

[17]  Carol Friedman,et al.  Exploiting Semantic Relations for Literature-Based Discovery , 2006, AMIA.

[18]  Hsinchun Chen,et al.  Genescene: An ontology-enhanced integration of linguistic and co-occurrence based relations in biomedical texts , 2005, J. Assoc. Inf. Sci. Technol..

[19]  Paul F. Bugni,et al.  A knowledgebase system to enhance scientific discovery: Telemakus , 2004, Biomedical digital libraries.

[20]  Snehasis Mukhopadhyay,et al.  TransMiner: Mining Transitive Associations among Biological Objects from Text , 2004, Journal of Biomedical Science.

[21]  Padmini Srinivasan,et al.  Mining MEDLINE for implicit links between dietary substances and diseases , 2004, ISMB/ECCB.

[22]  Allen C. Browne,et al.  Lexical methods for managing variation in biomedical terminologies. , 1994, Proceedings. Symposium on Computer Applications in Medical Care.

[23]  Mark R. Gilder,et al.  Extraction of protein interaction information from unstructured text using a context-free grammar , 2003, Bioinform..

[24]  Toshihisa Takagi,et al.  Data and text mining Automatic extraction of gene / protein biological functions from biomedical text , 2005 .

[25]  Halil Kilicoglu,et al.  Semantic Relations Asserting the Etiology of Genetic Diseases , 2003, AMIA.

[26]  Miguel A. Andrade-Navarro,et al.  Automatic Extraction of Biological Information from Scientific Text: Protein-Protein Interactions , 1999, ISMB.

[27]  Thomas C. Rindflesch,et al.  MedPost: a part-of-speech tagger for bioMedical text , 2004, Bioinform..