Enhancing the accuracy of knowledge discovery: a supervised learning method

BackgroundThe amount of biomedical literature available is growing at an explosive speed, but a large amount of useful information remains undiscovered in it. Researchers can make informed biomedical hypotheses through mining this literature. Unfortunately, popular mining methods based on co-occurrence produce too many target concepts, leading to the declining relevance ranking of the potential target concepts.MethodsThis paper presents a new method for selecting linking concepts which exploits statistical and textual features to represent each linking concept, and then classifies them as relevant or irrelevant to the starting concepts. Relevant linking concepts are then used to discover target concepts.ResultsThrough an evaluation it is observed textual features improve the results obtained with only statistical features. We successfully replicate Swanson's two classic discoveries and find the rankings of potentially relevant target concepts are relatively high.ConclusionsThe number of target concepts is greatly reduced and potentially relevant target concepts gain higher ranking by adopting only relevant linking concepts. Thus, the proposed method has the potential to help biomedical experts find the most useful and valuable target concepts effectively.

[1]  Saso Dzeroski,et al.  Supporting Discovery in Medicine by Association Rule Mining in Medline and UMLS , 2001, MedInfo.

[2]  Michael D. Gordon,et al.  Literature-Based Discovery by Lexical Statistics , 1999, J. Am. Soc. Inf. Sci..

[3]  D. Swanson Migraine and Magnesium: Eleven Neglected Connections , 2015, Perspectives in biology and medicine.

[4]  Trevor Cohen,et al.  Discovery at a distance: Farther journeys in predication space , 2012, 2012 IEEE International Conference on Bioinformatics and Biomedicine Workshops.

[5]  WeeberMarc,et al.  Using concepts in literature-based discovery , 2001 .

[6]  Wanda Pratt,et al.  H.3.3 Information Search and Retrieval , 2022 .

[7]  Trevor Cohen,et al.  Predication-based Semantic Indexing: Permutations as a Means to Encode Predications in Semantic Space , 2009, AMIA.

[8]  Jonathan D. Wren,et al.  Extending the mutual information measure to rank inferred literature relationships , 2004, BMC Bioinformatics.

[9]  Neil R. Smalheiser,et al.  Artificial Intelligence An interactive system for finding complementary literatures : a stimulus to scientific discovery , 1995 .

[10]  Padmini Srinivasan,et al.  Text mining: Generating hypotheses from MEDLINE , 2004, J. Assoc. Inf. Sci. Technol..

[11]  Marc Weeber,et al.  Case Report: Generating Hypotheses by Discovering Implicit Associations in the Literature: A Case Report of a Search for New Potential Therapeutic Uses for Thalidomide , 2003, J. Am. Medical Informatics Assoc..

[12]  R. DiGiacomo,et al.  Fish-oil dietary supplementation in patients with Raynaud's phenomenon: a double-blind, controlled, prospective study. , 1989, The American journal of medicine.

[13]  D. Swanson Fish Oil, Raynaud's Syndrome, and Undiscovered Public Knowledge , 2015, Perspectives in biology and medicine.

[14]  Amit P. Sheth,et al.  A graph-based recovery and decomposition of Swanson's hypothesis using semantic predications , 2013, J. Biomed. Informatics.

[15]  Don R. Swanson,et al.  Two medical literatures that are logically but not bibliographically connected , 1987, J. Am. Soc. Inf. Sci..

[16]  Michael D. Gordon,et al.  Toward Discovery Support Systems: A Replication, Re-Examination, and Extension of Swanson's Work on Literature-Based Discovery of a Connection between Raynaud's and Fish Oil , 1996, J. Am. Soc. Inf. Sci..

[17]  Neil R. Smalheiser,et al.  Ranking indirect connections in literature-based discovery: The role of medical subject headings , 2006, J. Assoc. Inf. Sci. Technol..