Mining scientific literature to predict new relationships

It is compelling to process scientific literature to support the development of new science and technology. We propose a method to predict new relationships between a starting concept of interest and other concepts by mining scientific literature. In contrast to previous research, we measure the relationship between two concepts not only by their co-occurrence in scientific literature, but also by their sibling relationship in a hierarchical structure of concepts. Therefore, the predicted relationships of concepts obtained with our method are more pertinent to existing relationships within current scientific literature. By introducing a parent set, we propose a measure to evaluate the closeness of two concepts in a hierarchical structure of concepts. In order to deal with the combinatorial problems, we present two ways to limit the number of new relationships, which can be interactively enforced by the user. As in most of the previous research on literature-based discoveries, we choose biomedicine as the field in which to demonstrate our method. A comparison with related research shows that our method exhibits better performance, except in term of Recall. The new relationships predicted by this method can serve as candidates for new research themes, as impetus for inspiration, or as hypotheses to be tested in future.

[1]  D. Swanson Somatomedin C and Arginine: Implicit Connections between Mutually Isolated Literatures , 2015, Perspectives in biology and medicine.

[2]  B J Stapley,et al.  Biobibliometrics: information retrieval and visualization from co-occurrences of gene names in Medline abstracts. , 1999, Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing.

[3]  Saso Dzeroski,et al.  Supporting Discovery in Medicine by Association Rule Mining in Medline and UMLS , 2001, MedInfo.

[4]  Michael D. Gordon,et al.  Toward Discovery Support Systems: A Replication, Re-Examination, and Extension of Swanson's Work on Literature-Based Discovery of a Connection between Raynaud's and Fish Oil , 1996, J. Am. Soc. Inf. Sci..

[5]  P. Bork,et al.  Association of genes to genetically inherited diseases using data mining , 2002, Nature Genetics.

[6]  Neil R. Smalheiser,et al.  Information discovery from complementary literatures: categorizing viruses as potential weapons , 2001 .

[7]  Marc Weeber,et al.  Text-based discovery in biomedicine: the architecture of the DAD-system , 2000, AMIA.

[8]  Don R. Swanson,et al.  Two medical literatures that are logically but not bibliographically connected , 1987, J. Am. Soc. Inf. Sci..

[9]  Michael D. Gordon,et al.  Literature-based discovery by lexical statistics , 1999 .

[10]  Marc Weeber,et al.  Case Report: Generating Hypotheses by Discovering Implicit Associations in the Literature: A Case Report of a Search for New Potential Therapeutic Uses for Thalidomide , 2003, J. Am. Medical Informatics Assoc..

[11]  Javed Mostafa,et al.  Detecting Gene Relations from MEDLINE Abstracts , 2000, Pacific Symposium on Biocomputing.

[12]  Don R. Swanson,et al.  Online search for logically-related noninteractive medical literatures: A systematic trial-and-error strategy , 1989, JASIS.

[13]  Joyce A. Mitchell,et al.  Improving Literature Based Discovery Support by Genetic Knowledge Integration , 2003, MIE.

[14]  D. Swanson Fish Oil, Raynaud's Syndrome, and Undiscovered Public Knowledge , 2015, Perspectives in biology and medicine.

[15]  Padmini Srinivasan,et al.  Text mining: Generating hypotheses from MEDLINE , 2004, J. Assoc. Inf. Sci. Technol..

[16]  D. Swanson A second example of mutually isolated medical literatures related by implicit, unnoticed connections. , 1989 .

[17]  Marc Weeber,et al.  Using concepts in literature-based discovery: simulating Swanson's Raynaud-fish oil and migraine-magnesium discoveries , 2001 .

[18]  Hagit Shatkay,et al.  Genes, Themes, and Microarrays: Using Information Retrieval for Large-Scale Gene Analysis , 2000, ISMB.

[19]  Neil R. Smalheiser,et al.  Artificial Intelligence An interactive system for finding complementary literatures : a stimulus to scientific discovery , 1995 .

[20]  Jonathan D. Wren,et al.  Knowledge discovery by automated identification and ranking of implicit relationships , 2004, Bioinform..

[21]  Jonathan D. Wren,et al.  Shared relationship analysis: ranking set cohesion and commonalities within a literature-derived relationship network , 2004, Bioinform..

[22]  Don R. Swanson,et al.  Complementary structures in disjoint science literatures , 1991, SIGIR '91.

[23]  Neil R. Smalheiser,et al.  Implicit Text Linkages between Medline Records: Using Arrowsmith as an Aid to Scientific Discovery , 1999, Libr. Trends.

[24]  N R Smalheiser,et al.  Using ARROWSMITH: a computer-assisted approach to formulating and assessing scientific hypotheses. , 1998, Computer methods and programs in biomedicine.

[25]  D. Swanson Medical literature as a potential source of new knowledge. , 1990, Bulletin of the Medical Library Association.

[26]  T. Jenssen,et al.  A literature network of human genes for high-throughput analysis of gene expression , 2001 .