Recent Advances in Literature Based Discovery

Literature Based Discovery (LBD) is a process that searches for hidden and important connections among information embedded in published literature. Employing techniques from Information Retrieval and Natural Language Processing, LBD has potential for widespread application yet is currently implemented primarily in the medical domain. This article examines several published LBD systems, comparing their descriptions of domain and input data, techniques to locate important concepts from text, models of discovery, experimental results, visualizations, and evaluation of the results. Since there is no comprehensive “gold standard,” or consistent formal evaluation methodology for LBD systems, the development and usage of effective metrics for such systems is also discussed, providing several options. Also, since LBD is currently often time-intensive, requiring human input at one or more points, a fully-automated system will enhance the efficiency of the process. Therefore, this article considers methods for automated systems based on data mining.

[1]  Don R. Swanson,et al.  Complementary structures in disjoint science literatures , 1991, SIGIR '91.

[2]  Joyce A. Mitchell,et al.  Using literature-based discovery to identify disease candidate genes , 2005, Int. J. Medical Informatics.

[3]  Kenneth A. Cory Discovering Hidden Analogies in an Online Humanities Database , 1999, Libr. Trends.

[4]  Erik M. van Mulligen,et al.  Constructing an associative concept space for literature-based discovery , 2004, J. Assoc. Inf. Sci. Technol..

[5]  D. Swanson,et al.  Linking estrogen to Alzheimer's disease , 1996, Neurology.

[6]  Michael D. Gordon,et al.  Literature-Based Discovery by Lexical Statistics , 1999, J. Am. Soc. Inf. Sci..

[7]  D. Valle,et al.  Online Mendelian Inheritance In Man (OMIM) , 2000, Human mutation.

[8]  Richard O. Duda,et al.  Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[9]  H. Lowe,et al.  Understanding and using the medical subject headings (MeSH) vocabulary to perform literature searches. , 1994, JAMA.

[10]  Neil R. Smalheiser,et al.  Information discovery from complementary literatures: Categorizing viruses as potential weapons , 2001, J. Assoc. Inf. Sci. Technol..

[11]  Padmini Srinivasan,et al.  Mining MEDLINE for implicit links between dietary substances and diseases , 2004, ISMB/ECCB.

[12]  H R Garner,et al.  Heuristics for Identification of Acronym-Definition Patterns within Text: Towards an Automated Construction of Comprehensive Acronym-Definition Dictionaries , 2002, Methods of Information in Medicine.

[13]  Michael D. Gordon,et al.  Toward Discovery Support Systems: A Replication, Re-Examination, and Extension of Swanson's Work on Literature-Based Discovery of a Connection between Raynaud's and Fish Oil , 1996, J. Am. Soc. Inf. Sci..

[14]  Marc Weeber,et al.  Text-based discovery in biomedicine: the architecture of the DAD-system , 2000, AMIA.

[15]  D. Lindberg,et al.  Unified Medical Language System , 2020, Definitions.

[16]  Padmini Srinivasan,et al.  Text mining: Generating hypotheses from MEDLINE , 2004, J. Assoc. Inf. Sci. Technol..

[17]  D. Swanson Fish Oil, Raynaud's Syndrome, and Undiscovered Public Knowledge , 2015, Perspectives in biology and medicine.

[18]  D. Swanson Migraine and Magnesium: Eleven Neglected Connections , 2015, Perspectives in biology and medicine.

[19]  Marc Weeber,et al.  Advances in Literature-Based Discovery , 2001 .

[20]  C. J. van Rijsbergen,et al.  Proceedings of the 10th annual international ACM SIGIR conference on Research and development in information retrieval , 1987, SIGIR 1987.

[21]  D. Swanson,et al.  Indomethacin and Alzheimer's disease , 1996, Neurology.

[22]  Neil R. Smalheiser,et al.  Artificial Intelligence An interactive system for finding complementary literatures : a stimulus to scientific discovery , 1995 .

[23]  Rein Vos,et al.  How adverse drug reactions can play a role in innovative drug research , 1995, Pharmacy World and Science.

[24]  Wanda Pratt,et al.  Interaction design for literature-based discovery , 2005, CHI Extended Abstracts.

[25]  Heikki Mannila,et al.  Fast Discovery of Association Rules , 1996, Advances in Knowledge Discovery and Data Mining.

[26]  Neil R. Smalheiser,et al.  Assessing a gap in the biomedical literature: Magnesium deficiency and neurologic disease , 1994 .

[27]  Saso Dzeroski,et al.  Supporting Discovery in Medicine by Association Rule Mining in Medline and UMLS , 2001, MedInfo.

[28]  Mathew W. Wright,et al.  The HUGO Gene Nomenclature Committee (HGNC) , 2001, Human Genetics.

[29]  S. Amladi,et al.  Online Mendelian Inheritance in Man 'OMIM'. , 2003, Indian journal of dermatology, venereology and leprology.

[30]  Marc Weeber,et al.  Using concepts in literature-based discovery: Simulating Swanson's Raynaud-fish oil and migraine-magnesium discoveries , 2001, J. Assoc. Inf. Sci. Technol..

[31]  D. Swanson ASIST Award of Merit Acceptance Speech: On the Fragmentation of Knowledge, the Connection Explosion, and Assembling Other People's Ideas , 2005 .

[32]  D. Swanson,et al.  Calcium-independent phospholipase A2 and schizophrenia. , 1998, Archives of general psychiatry.

[33]  Jonathan D. Wren,et al.  Knowledge discovery by automated identification and ranking of implicit relationships , 2004, Bioinform..

[34]  Raúl E. Valdés-Pérez,et al.  Conjecturing Hidden Entities by Means of Simplicity and Conservation Laws: Machine Discovery in Chemistry , 1994, Artif. Intell..

[35]  N R Smalheiser,et al.  Using ARROWSMITH: a computer-assisted approach to formulating and assessing scientific hypotheses. , 1998, Computer methods and programs in biomedicine.

[36]  Susan T. Dumais,et al.  Using Latent Semantic Indexing for Literature Based Discovery , 1998, J. Am. Soc. Inf. Sci..

[37]  Donna R. Maglott,et al.  NCBI's LocusLink and RefSeq , 2000, Nucleic Acids Res..

[38]  Weiguo Fan,et al.  Literature-based discovery on the World Wide Web , 2002, TOIT.

[39]  Wanda Pratt,et al.  H.3.3 Information Search and Retrieval , 2022 .

[40]  Alan R. Aronson,et al.  Effective mapping of biomedical text to the UMLS Metathesaurus: the MetaMap program , 2001, AMIA.

[41]  William M. Pottenger,et al.  A Supervised Learning Algorithm for Information Extraction from Textual Data , 2004 .

[42]  Marc Weeber Literature-based discovery in biomedicine , 2001 .

[43]  Richard A. Harshman,et al.  Indexing by Latent Semantic Analysis , 1990, J. Am. Soc. Inf. Sci..

[44]  D. Swanson Somatomedin C and Arginine: Implicit Connections between Mutually Isolated Literatures , 2015, Perspectives in biology and medicine.

[45]  William M. Pottenger,et al.  A Framework for Understanding LSI Performance , 2004 .

[46]  Yi-fang Brook Wu,et al.  Domain-specific keyphrase extraction , 2005, CIKM '05.