Can the vector space model be used to identify biological entity activities?

BackgroundBiological systems are commonly described as networks of entity interactions. Some interactions are already known and integrate the current knowledge in life sciences. Others remain unknown for long periods of time and are frequently discovered by chance. In this work we present a model to predict these unknown interactions from a textual collection using the vector space model (VSM), a well known and established information retrieval model. We have extended the VSM ability to retrieve information using a transitive closure approach. Our objective is to use the VSM to identify the known interactions from the literature and construct a network. Based on interactions established in the network our model applies the transitive closure in order to predict and rank new interactions.ResultsWe have tested and validated our model using a collection of patent claims issued from 1976 to 2005. From 266,528 possible interactions in our network, the model identified 1,027 known interactions and predicted 3,195 new interactions. Iterating the model according to patent issue dates, interactions found in a given past year were often confirmed by patent claims not in the collection and issued in more recent years. Most confirmation patent claims were found at the top 100 new interactions obtained from each subnetwork. We have also found papers on the Web which confirm new inferred interactions. For instance, the best new interaction inferred by our model relates the interaction between the adrenaline neurotransmitter and the androgen receptor gene. We have found a paper that reports the partial dependence of the antiapoptotic effect of adrenaline on androgen receptor.ConclusionsThe VSM extended with a transitive closure approach provides a good way to identify biological interactions from textual collections. Specifically for the context of literature-based discovery, the extended VSM contributes to identify and rank relevant new interactions even if these interactions occcur in only a few documents in the collection. Consequently, we have developed an efficient method for extracting and restricting the best potential results to consider as new advances in life sciences, even when indications of these results are not easily observed from a mass of documents.

[1]  N R Smalheiser,et al.  Using ARROWSMITH: a computer-assisted approach to formulating and assessing scientific hypotheses. , 1998, Computer methods and programs in biomedicine.

[2]  Ronald N. Kostoff,et al.  Literature-related discovery (LRD): Methodology , 2008 .

[3]  Carol Friedman,et al.  Exploiting Semantic Relations for Literature-Based Discovery , 2006, AMIA.

[4]  K. Fischbeck,et al.  Mitochondrial abnormalities in spinal and bulbar muscular atrophy , 2008, Human molecular genetics.

[5]  Koenraad Debackere,et al.  Developing technology in the vicinity of science: An examination of the relationship between science intensity (of patents) and technological productivity within the field of biotechnology , 2007, Scientometrics.

[6]  Sougata Mukherjea,et al.  BioPatentMiner: An Information Retrieval System for BioMedical Patents , 2004, VLDB.

[7]  Makoto Iwayama,et al.  Patent Claim Processing for Readability - Structure Analysis and Term Explanation , 2003, ACL 2003.

[8]  Jonathan D. Wren,et al.  Knowledge discovery by automated identification and ranking of implicit relationships , 2004, Bioinform..

[9]  K. Kaufman,et al.  Trazodone and Ejaculatory Inhibition , 2007, Journal of Sex & Marital Therapy.

[10]  Paul A Clemons,et al.  The Connectivity Map: Using Gene-Expression Signatures to Connect Small Molecules, Genes, and Disease , 2006, Science.

[11]  L. Brown,et al.  Effect of HIV-1-related protein expression on cardiac and skeletal muscles from transgenic rats , 2008, AIDS research and therapy.

[12]  Leah S. Larkey,et al.  A patent search and classification system , 1999, DL '99.

[13]  Neil R. Smalheiser,et al.  Ranking indirect connections in literature-based discovery: The role of medical subject headings , 2006, J. Assoc. Inf. Sci. Technol..

[14]  M. Wood,et al.  Short non-coding RNA biology and neurodegenerative disorders: novel disease targets and therapeutics. , 2009, Human molecular genetics.

[15]  A. Törcsvári,et al.  Automated categorization in the international patent classification , 2003, SIGF.

[16]  T. Jenssen,et al.  A literature network of human genes for high-throughput analysis of gene expression , 2001, Nature Genetics.

[17]  Richard B. Silverman,et al.  The Organic Chemistry of Drug Design and Drug Action , 1992 .

[18]  Edward A. Fox,et al.  Proceedings of the Fourth ACM conference on Digital Libraries, August 11-14, 1999, Berkeley, CA, USA , 1999 .

[19]  J. Ellingboe The Organic Chemistry of Drug Design and Drug Action. Second Edition By Richard B. Silverman. Elsevier Academic Press, London. 2004. xix + 617 pp. 19 × 27 cm. ISBN 0-12-643732-7. $80.00. , 2004 .

[20]  George Kulik,et al.  Epinephrine Protects Cancer Cells from Apoptosis via Activation of cAMP-dependent Protein Kinase and BAD Phosphorylation* , 2007, Journal of Biological Chemistry.

[21]  Ronald N. Kostoff,et al.  Literature-related discovery (LRD): Lessons learned, and future research directions , 2008 .

[22]  Ronald N. Kostoff,et al.  Literature-related discovery (LRD): Potential treatments for Multiple Sclerosis , 2008 .

[23]  Neil R. Smalheiser,et al.  The Place of Literature-Based Discovery in Contemporary Scientific Practice , 2008 .

[24]  Anthony J. Trippe,et al.  Patinformatics: Tasks to tools , 2003 .

[25]  Ian H. Witten,et al.  Managing Gigabytes: Compressing and Indexing Documents and Images , 1999 .

[26]  Ronald N. Kostoff,et al.  Literature-Related Discovery (LRD) , 2007 .

[27]  P. Bork,et al.  Drug Target Identification Using Side-Effect Similarity , 2008, Science.

[28]  M. Laorden,et al.  Effects of rolipram, pimobendan and zaprinast on ischaemia-induced dysrhythmias and on ventricular cyclic nucleotide content in the anaesthetized rat , 2003, European journal of anaesthesiology.

[29]  Ronald N. Kostoff,et al.  Literature-related discovery (LRD): Potential treatments for cataracts , 2008 .

[30]  Marc Weeber,et al.  Using concepts in literature-based discovery: Simulating Swanson's Raynaud-fish oil and migraine-magnesium discoveries , 2001, J. Assoc. Inf. Sci. Technol..

[31]  Roger E Bumgarner,et al.  Integrated genomic and proteomic analyses of a systematically perturbed metabolic network. , 2001, Science.

[32]  E. L. Harder,et al.  The Institute of Electrical and Electronics Engineers, Inc. , 2019, 2019 IEEE International Conference on Software Architecture Companion (ICSA-C).

[33]  Joyce A. Mitchell,et al.  Improving Literature Based Discovery Support by Genetic Knowledge Integration , 2003, MIE.

[34]  W. Shou,et al.  The immunophilin ligands cyclosporin A and FK506 suppress prostate cancer cell growth by androgen receptor-dependent and -independent mechanisms. , 2007, Endocrinology.

[35]  Ronald N. Kostoff,et al.  Literature-Related Discovery (LRD): Introduction and background , 2008 .

[36]  William R. Hersh,et al.  Managing Gigabytes—Compressing and Indexing Documents and Images (Second Edition) , 2001, Information Retrieval.

[37]  D. Swanson Medical literature as a potential source of new knowledge. , 1990, Bulletin of the Medical Library Association.

[38]  Petter Holme,et al.  Model validation of simple-graph representations of metabolism , 2008, Journal of The Royal Society Interface.

[39]  Ronald N. Kostoff,et al.  Literature-related discovery (LRD): Water purification , 2008 .

[40]  Patrick Ruch Literature-based Discovery , 2010, J. Assoc. Inf. Sci. Technol..

[41]  Wolfgang Glänzel,et al.  Patents cited in the scientific literature: An exploratory study of 'reverse' citation relations , 2004, Scientometrics.

[42]  R. N. Kostoff,et al.  Where is the Discovery in Literature-Based Discovery? , 2008 .

[43]  Saso Dzeroski,et al.  Supporting Discovery in Medicine by Association Rule Mining in Medline and UMLS , 2001, MedInfo.

[44]  Michael McGill,et al.  Introduction to Modern Information Retrieval , 1983 .

[45]  Susumu Goto,et al.  KEGG: Kyoto Encyclopedia of Genes and Genomes , 2000, Nucleic Acids Res..

[46]  Francis Narin,et al.  Linkage between patents and papers: An interim EPO/US comparison , 2006, Scientometrics.

[47]  Ronald N. Kostoff,et al.  Literature-Related Discovery (LRD): Potential treatments for Parkinson's Disease , 2008 .

[48]  R. Silverman Drug Discovery, Design, and Development , 2004 .

[49]  L. Levy,et al.  Growth regulation of simian and human AIDS-related non-Hodgkin's lymphoma cell lines by TGF-β1 and IL-6 , 2007, BMC Cancer.

[50]  Joyce A. Mitchell,et al.  Using literature-based discovery to identify disease candidate genes , 2005, Int. J. Medical Informatics.

[51]  Yuen-Hsien Tseng,et al.  Text mining techniques for patent analysis , 2007, Inf. Process. Manag..

[52]  Ronald N. Kostoff,et al.  Literature-related discovery (LRD): Potential treatments for Raynaud's Phenomenon☆ , 2008 .

[53]  D. Swanson Fish Oil, Raynaud's Syndrome, and Undiscovered Public Knowledge , 2015, Perspectives in biology and medicine.