Mining Novel Knowledge from Biomedical Literature using Statistical Measures and Domain Knowledge

The massive and unprecedented volume of scientific literature readily available in the domain of biomedicine has presented us with challenges and opportunities to accelerate hypothesis generation. Advanced text mining techniques are required to leverage this abundant textual representation in order to provide timely access to explicit facts and aid in elucidating association among implicit facts. The problem of inferring novel knowledge from these implicit facts by logically connecting independent fragments of literature is known as Literature Based Discovery(LBD). In LBD, to discover hidden links, it is important to determine the relevancy between concepts using appropriate information measures. In this paper, to discover interesting and inherent links latent in large corpora, nine distinct methods, comprising variants of statistical information measures and derived semantic knowledge from domain ontology, are designed and compared. For better understanding of results, we split methods into three groups. The first group includes traditional information measures such as Mutual information, Chi-Square and those used in association rule mining; the second group incorporates popular null-invariant correlation measures: All_Confidence, Kulczynski, and Cosine; the third group consists of null-invariant measures combined with our proposed notion of semantic relatedness. We have also proposed a new strategy of effective preprocessing, which is capable of removing terms that are spurious, semantically unrelated or have meager chances of constituting a new discovery. A series of experiments are performed and analyzed for those proposed methods. In addition, we also provide an organized list of final concepts deemed worthy of scientific investigation or experimentation. Overall, our research presents a comprehensive analysis and perspective of how different statistical information measures and semantic knowledge affect the knowledge discovery procedure.

[1]  K. Welch,et al.  Low Brain Magnesium in Migraine , 1989, Headache.

[2]  Trevor Cohen,et al.  Discovery by scent: Discovery browsing system based on the Information Foraging Theory , 2012, 2012 IEEE International Conference on Bioinformatics and Biomedicine Workshops.

[3]  Michael D. Gordon,et al.  Literature-Based Discovery by Lexical Statistics , 1999, J. Am. Soc. Inf. Sci..

[4]  D. Swanson,et al.  Linking estrogen to Alzheimer's disease , 1996, Neurology.

[5]  Saso Dzeroski,et al.  Supporting Discovery in Medicine by Association Rule Mining in Medline and UMLS , 2001, MedInfo.

[6]  Meliha Yetisgen-Yildiz,et al.  Evaluation of Literature-Based Discovery Systems , 2008 .

[7]  Thomas C. Rindflesch,et al.  Large-Scale Structure of a Network of Co-Occurring MeSH Terms: Statistical Analysis of Macroscopic Properties , 2014, PloS one.

[8]  Marc Weeber,et al.  Text-based discovery in biomedicine: the architecture of the DAD-system , 2000, AMIA.

[9]  G Abbritti,et al.  Serum and Salivary Magnesium Levels in Migraine. Results in a Group of Juvenile Patients , 1992, Headache.

[10]  D. Swanson Fish Oil, Raynaud's Syndrome, and Undiscovered Public Knowledge , 2015, Perspectives in biology and medicine.

[11]  Neil R. Smalheiser,et al.  Information discovery from complementary literatures: Categorizing viruses as potential weapons , 2001, J. Assoc. Inf. Sci. Technol..

[12]  Halil Kilicoglu,et al.  Using the Literature-Based Discovery Paradigm to Investigate Drug Mechanisms , 2007, AMIA.

[13]  Marcelo Fiszman,et al.  Graph-based methods for discovery browsing with semantic predications. , 2011, AMIA ... Annual Symposium proceedings. AMIA Symposium.

[14]  Weiguo Fan,et al.  Literature-based discovery on the World Wide Web , 2002, TOIT.

[15]  Grey Giddins,et al.  Statistics , 2016, The Journal of hand surgery, European volume.

[16]  Amit P. Sheth,et al.  Semantic Predications for Complex Information Needs in Biomedical Literature , 2011, 2011 IEEE International Conference on Bioinformatics and Biomedicine.

[17]  D. Chaussabel,et al.  Mining microarray expression data by literature profiling , 2002, Genome Biology.

[18]  Michael D. Gordon,et al.  Toward Discovery Support Systems: A Replication, Re-Examination, and Extension of Swanson's Work on Literature-Based Discovery of a Connection between Raynaud's and Fish Oil , 1996, J. Am. Soc. Inf. Sci..

[19]  Carol Friedman,et al.  Exploiting Semantic Relations for Literature-Based Discovery , 2006, AMIA.

[20]  D. Swanson,et al.  Indomethacin and Alzheimer's disease , 1996, Neurology.

[21]  Marc Weeber,et al.  Case Report: Generating Hypotheses by Discovering Implicit Associations in the Literature: A Case Report of a Search for New Potential Therapeutic Uses for Thalidomide , 2003, J. Am. Medical Informatics Assoc..

[22]  R. DiGiacomo,et al.  Fish-oil dietary supplementation in patients with Raynaud's phenomenon: a double-blind, controlled, prospective study. , 1989, The American journal of medicine.

[23]  Yanqing Zhang,et al.  A Semantic Approach for Mining Hidden Links from Complementary and Non-interactive Biomedical Literature , 2006, SDM.

[24]  Padmini Srinivasan,et al.  Text mining: Generating hypotheses from MEDLINE , 2004, J. Assoc. Inf. Sci. Technol..

[25]  J. A. Fisher Fish Oil , 1988, The Lancet.

[26]  Jiawei Han,et al.  Association Mining in Large Databases: A Re-examination of Its Measures , 2007, PKDD.

[27]  Xiaofeng Wang,et al.  Mining hidden connections among biomedical concepts from disjoint biomedical literature sets through semantic‐based association rule , 2010, Int. J. Intell. Syst..

[28]  Joyce A. Mitchell,et al.  Improving Literature Based Discovery Support by Genetic Knowledge Integration , 2003, MIE.

[29]  Martin Theobald,et al.  Extraction of Conditional Probabilities of the Relationships Between Drugs, Diseases, and Genes from PubMed Guided by Relationships in PharmGKB , 2009, Summit on translational bioinformatics.

[30]  D. Swanson Migraine and Magnesium: Eleven Neglected Connections , 2015, Perspectives in biology and medicine.

[31]  D. Swanson,et al.  Calcium-independent phospholipase A2 and schizophrenia. , 1998, Archives of general psychiatry.

[32]  P. Cochat,et al.  Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.

[33]  Wanda Pratt,et al.  H.3.3 Information Search and Retrieval , 2022 .

[34]  Tiziana di Matteo,et al.  Graph Theory Enables Drug Repurposing – How a Mathematical Model Can Drive the Discovery of Hidden Mechanisms of Action , 2013, PloS one.

[35]  Michael D. Gordon,et al.  Toward Discovery Support Systems: A Replication, Re-Examination, and Extension of Swanson's Work on Literature-Based Discovery of a Connection between Raynaud's and Fish Oil , 1996, J. Am. Soc. Inf. Sci..

[36]  Wanda Pratt,et al.  Using statistical and knowledge-based approaches for literature-based discovery , 2006, J. Biomed. Informatics.

[37]  Jonathan D. Wren,et al.  Extending the mutual information measure to rank inferred literature relationships , 2004, BMC Bioinformatics.

[38]  Xiaodan Zhang,et al.  Mining Biomedical Knowledge Using Chi-Square Association Rule , 2010, 2010 IEEE International Conference on Granular Computing.

[39]  Petra Perner,et al.  Data Mining - Concepts and Techniques , 2002, Künstliche Intell..

[40]  Amit P. Sheth,et al.  Context-Driven Automatic Subgraph Creation for Literature-Based Discovery , 2015, J. Biomed. Informatics.