A systematic review on literature-based discovery workflow

As scientific publication rates increase, knowledge acquisition and the research development process have become more complex and time-consuming. LiteratureBased Discovery (LBD), supporting automated knowledge discovery, helps facilitate this process by eliciting novel knowledge by analysing existing scientific literature. This systematic review provides a comprehensive overview of the LBD workflow by answering nine research questions related to the major components of the LBD workflow (i.e., input, process, output, and evaluation). With regards to the input component, we discuss the data types and data sources used in the literature. The process component presents filtering techniques, ranking/thresholding techniques, domains, generalisability levels, and resources. Subsequently, the output component focuses on the visualisation techniques used in LBDdiscipline. As for the evaluation component, we outline the evaluation techniques, their generalisability, and the quantitative measures used to validate results. To conclude, we summarise the findings of the review for each component by highlighting the possible future research directions. Subjects Data Mining and Machine Learning, Data Science

[1]  Jonathan D. Wren,et al.  Extending the mutual information measure to rank inferred literature relationships , 2004, BMC Bioinformatics.

[2]  Steven B. Kraines,et al.  Discovering Relationship Associations in Life Sciences using Ontology and Inference , 2009, KDIR.

[3]  Peter J. Haas,et al.  Automated hypothesis generation based on mining scientific literature , 2014, KDD.

[4]  Ronald N. Kostoff,et al.  Literature-related discovery (LRD): Methodology , 2008 .

[5]  Akiko Aizawa,et al.  An information-theoretic perspective of tf-idf measures , 2003, Inf. Process. Manag..

[6]  Saso Dzeroski,et al.  Supporting Discovery in Medicine by Association Rule Mining in Medline and UMLS , 2001, MedInfo.

[7]  Vijay V. Raghavan,et al.  Supervised approach to rank predicted links using interestingness measures , 2017, 2017 IEEE International Conference on Bioinformatics and Biomedicine (BIBM).

[8]  Joyce A. Mitchell,et al.  Using literature-based discovery to identify disease candidate genes , 2005, Int. J. Medical Informatics.

[9]  Beatriz Sousa Santos,et al.  Evaluating Visualization techniques and tools: what are the main issues? , 2007 .

[10]  Jinyan Su,et al.  Literature-based Multidiscipline Knowledge Discovery: A New Application of Bibliometrics , 2009 .

[11]  Steven J. M. Jones,et al.  A collaborative filtering-based approach to biomedical knowledge discovery , 2018, Bioinform..

[12]  José M. Vicente Gomila,et al.  The contribution of syntactic-semantic approach to the search for complementary literatures for scientific or technical discovery , 2014, Scientometrics.

[13]  Guillermo Palma,et al.  An authority-flow based ranking approach to discover potential novel associations between Linked Data , 2014, Semantic Web.

[14]  Lih-Yuan Deng,et al.  Navigating the Functional Landscape of Transcription Factors via Non-Negative Tensor Factorization Analysis of MEDLINE Abstracts , 2017, Front. Bioeng. Biotechnol..

[15]  H. V. Jagadish,et al.  Literature-based discovery of diabetes- and ROS-related targets , 2010, BMC Medical Genomics.

[16]  K. Fujita,et al.  Finding linkage between technology and social issues: A literature based discovery approach , 2012, 2012 Proceedings of PICMET '12: Technology Management for Emerging Technologies.

[17]  Seung Han Beak,et al.  Discovering New Genes in the Pathways of Common Sporadic Neurodegenerative Diseases: A Bioinformatics Approach. , 2016, Journal of Alzheimer's disease : JAD.

[18]  Maren Duvendack,et al.  The benefits and challenges of using systematic reviews in international development research , 2012 .

[19]  Padmini Srinivasan,et al.  A semantic approach to involve Twitter in LBD efforts , 2012, 2012 IEEE International Conference on Bioinformatics and Biomedicine Workshops.

[20]  Mark Stevenson,et al.  The Effect of Word Sense Disambiguation Accuracy on Literature Based Discovery , 2015, DTMBIO@CIKM.

[21]  Eu-Gene Siew,et al.  Emerging approaches in literature-based discovery: techniques and performance review , 2017, The Knowledge Engineering Review.

[22]  Jose M. Vicente-Gomila The contribution of syntactic–semantic approach to the search for complementary literatures for scientific or technical discovery , 2014 .

[23]  Diane Kelly,et al.  Interactive Information Seeking Behaviour and Retrieval , 2011 .

[24]  Krzysztof J. Cios,et al.  Discovering relational knowledge from two disjoint sets of literatures using inductive Logic Programming , 2009, 2009 IEEE Symposium on Computational Intelligence and Data Mining.

[25]  Laurianne Sitbon,et al.  The Efficiency of Corpus-based Distributional Models for Literature-based Discovery on Large Data Sets , 2014, AWC.

[26]  Thomas C. Rindflesch,et al.  Predicting High-Throughput Screening Results With Scalable Literature-Based Discovery Methods , 2014, CPT: pharmacometrics & systems pharmacology.

[27]  Trevor Cohen,et al.  Predication-based Semantic Indexing: Permutations as a Means to Encode Predications in Semantic Space , 2009, AMIA.

[28]  Hsinchun Chen,et al.  Automated criminal link analysis based on domain knowledge , 2007, J. Assoc. Inf. Sci. Technol..

[29]  Susan T. Dumais,et al.  Using Latent Semantic Indexing for Literature Based Discovery , 1998, J. Am. Soc. Inf. Sci..

[30]  Zhou Yang,et al.  Research on Non-interactive Literature-Based Knowledge Discovery , 2008, 2008 International Conference on Computer Science and Software Engineering.

[31]  K. Welch,et al.  Low Brain Magnesium in Migraine , 1989, Headache.

[32]  A. Persidis,et al.  Systems literature analysis. , 2004, Pharmacogenomics.

[33]  Yi Hu,et al.  Simulation of Swanson's Literature-Based Discovery: Anandamide Treatment Inhibits Growth of Gastric Cancer Cells In Vitro and In Silico , 2014, PloS one.

[34]  Neil R. Smalheiser,et al.  Literature-based discovery: Beyond the ABCs , 2012, J. Assoc. Inf. Sci. Technol..

[35]  Neil R. Smalheiser,et al.  Gaps within the Biomedical Literature: Initial Characterization and Assessment of Strategies for Discovery , 2017, Front. Res. Metr. Anal..

[36]  Ronald N. Kostoff,et al.  Literature-related discovery (LRD): Potential treatments for Raynaud's Phenomenon☆ , 2008 .

[37]  Hua Xu,et al.  Literature-Based Discovery of Confounding in Observational Clinical Data , 2016, AMIA.

[38]  M. Schuemie,et al.  Anni 2.0: a multipurpose text-mining tool for the life sciences , 2008, Genome Biology.

[39]  Michael D. Gordon,et al.  Toward Discovery Support Systems: A Replication, Re-Examination, and Extension of Swanson's Work on Literature-Based Discovery of a Connection between Raynaud's and Fish Oil , 1996, J. Am. Soc. Inf. Sci..

[40]  Neil R. Smalheiser,et al.  Ranking indirect connections in literature-based discovery: The role of medical subject headings , 2006, J. Assoc. Inf. Sci. Technol..

[41]  Nada Lavrac,et al.  Outlier Detection in Cross-Context Link Discovery for Creative Literature Mining , 2012, Comput. J..

[42]  Mark Stevenson,et al.  Exploring relation types for literature-based discovery , 2015, J. Am. Medical Informatics Assoc..

[43]  Michelangelo Ceci,et al.  Discovering Temporal Bisociations for Linking Concepts over Time , 2011, ECML/PKDD.

[44]  Thomas C. Rindflesch,et al.  Link Prediction on a Network of Co-occurring MeSH Terms: Towards Literature-based Discovery , 2016, Methods of Information in Medicine.

[45]  Shouyang Wang,et al.  Mining Medline for New Possible Relations of Concepts , 2004, CIS.

[46]  Peter J. Haas,et al.  Predicting Future Scientific Discoveries Based on a Networked Analysis of the Past Literature , 2015, KDD.

[47]  Naren Ramakrishnan,et al.  Connecting the Dots between PubMed Abstracts , 2012, PloS one.

[48]  Yukio Ohsawa,et al.  Matrix-like visualization based on topic modeling for discovering connections between disjoint disciplines , 2016, Intell. Decis. Technol..

[49]  Peter Davies,et al.  Discovering discovery patterns with predication-based Semantic Indexing , 2012, J. Biomed. Informatics.

[50]  Frâncila Weidt,et al.  Systematic Literature Review in Computer Science - A Practical Guide , 2016 .

[51]  Marcelo Fiszman,et al.  Graph-based methods for discovery browsing with semantic predications. , 2011, AMIA ... Annual Symposium proceedings. AMIA Symposium.

[52]  Ronald N. Kostoff,et al.  Information content in Medline record fields , 2004, Int. J. Medical Informatics.

[53]  Eu-Gene Siew,et al.  Predicting Future Links Between Disjoint Research Areas Using Heterogeneous Bibliographic Information Network , 2015, PAKDD.

[54]  Sampo Pyysalo,et al.  Neural networks for link prediction in realistic biomedical graphs: a multi-dimensional evaluation of graph embedding-based approaches , 2018, BMC Bioinformatics.

[55]  Huamin Zhang,et al.  Cordycepssinensis May Have a Dual Effect on Diabetic Retinopathy , 2015, 2015 7th International Conference on Information Technology in Medicine and Education (ITME).

[56]  R. P. van de Riet,et al.  Applications of Natural Language to Information Systems: Proceedings of the Second International Workshop June 26-28, 1996, Amsterdam, the Netherlands , 1996 .

[57]  Xiaowei Xu,et al.  Mining FDA drug labels using an unsupervised learning technique - topic modeling , 2011, BMC Bioinformatics.

[58]  Erik M. van Mulligen,et al.  Constructing an associative concept space for literature-based discovery , 2004, J. Assoc. Inf. Sci. Technol..

[59]  Amit P. Sheth,et al.  Context-Driven Automatic Subgraph Creation for Literature-Based Discovery , 2015, J. Biomed. Informatics.

[60]  Doheon Lee,et al.  MKEM: a Multi-level Knowledge Emergence Model for mining undiscovered public knowledge , 2009, BMC Bioinformatics.

[61]  W. Wasserman,et al.  Inferring novel gene-disease associations using Medical Subject Heading Over-representation Profiles , 2012, Genome Medicine.

[62]  Sérgio VA Campos,et al.  Can the vector space model be used to identify biological entity activities? , 2011, BMC Genomics.

[63]  Weiguo Fan,et al.  Literature-based discovery on the World Wide Web , 2002, TOIT.

[64]  Nathan Kibwami,et al.  USING THE LITERATURE BASED DISCOVERY RESEARCH METHOD IN A CONTEXT OF BUILT ENVIRONMENT RESEARCH , 2014 .

[65]  Hyunjin Kim,et al.  Discovering disease-associated drugs using web crawl data , 2016, SAC.

[66]  Neil R. Smalheiser The Arrowsmith Project: 2005 Status Report , 2005, Discovery Science.

[67]  Michael D. Gordon,et al.  Literature-based discovery by lexical statistics , 1999 .

[68]  Eu-Gene Siew,et al.  Learning the heterogeneous bibliographic information network for literature-based discovery , 2017, Knowl. Based Syst..

[69]  Xin Guo,et al.  Clustering algorithm in literature-based discovery , 2010, 2010 Seventh International Conference on Fuzzy Systems and Knowledge Discovery.

[70]  T. Rindflesch,et al.  A closed literature-based discovery technique finds a mechanistic link between hypogonadism and diminished sleep quality in aging men. , 2012, Sleep.

[71]  S. Pongor,et al.  Biomedical hypothesis generation by text mining and gene prioritization. , 2013, Protein and peptide letters.

[72]  Jonathan D. Wren,et al.  Knowledge discovery by automated identification and ranking of implicit relationships , 2004, Bioinform..

[73]  Jung-Hsien Chiang,et al.  Literature-based discovery of new candidates for drug repurposing , 2016, Briefings Bioinform..

[74]  Tanja Urbancic,et al.  Literature mining method RaJoLink for uncovering relations between biomedical concepts , 2009, J. Biomed. Informatics.

[75]  Erik M. van Mulligen,et al.  Automated extraction of potential migraine biomarkers using a semantic graph , 2017, J. Biomed. Informatics.

[76]  D. Swanson Somatomedin C and Arginine: Implicit Connections between Mutually Isolated Literatures , 2015, Perspectives in biology and medicine.

[77]  Fatiha Boubekeur,et al.  Information retrieval techniques for knowledge discovery in biomedical literature , 2013, 2013 11th International Symposium on Programming and Systems (ISPS).

[78]  Ronald N. Kostoff,et al.  Literature-related discovery: Potential treatments and preventatives for SARS , 2011, Technological Forecasting and Social Change.

[79]  Alan Christoffels,et al.  Dragon exploratory system on hepatitis C virus (DESHCV). , 2011, Infection, genetics and evolution : journal of molecular epidemiology and evolutionary genetics in infectious diseases.

[80]  Min Song,et al.  SemPathFinder: Semantic path analysis for discovering publicly unknown knowledge , 2015, J. Informetrics.

[81]  Trevor Cohen,et al.  EpiphaNet: An Interactive Tool to Support Biomedical Discoveries , 2010, Journal of biomedical discovery and collaboration.

[82]  Nada Lavrač,et al.  Outlier based literature exploration for cross-domain linking of Alzheimer's disease and gut microbiota , 2017, Expert Syst. Appl..

[83]  Dragomir R. Radev,et al.  Mining of vaccine-associated IFN-γ gene interaction networks using the Vaccine Ontology , 2011, J. Biomed. Semant..

[84]  Borut Peterlin,et al.  Combining Semantic Relations and DNA Microarray Data for Novel Hypotheses Generation , 2009, BioLINK@ISMB/ECCB.

[86]  Concetto Spampinato,et al.  Combining literature text mining with microarray data: advances for system biology modeling , 2012, Briefings Bioinform..

[87]  Wanda Pratt,et al.  Using statistical and knowledge-based approaches for literature-based discovery , 2006, J. Biomed. Informatics.

[88]  Dragomir R. Radev,et al.  Literature-Based Discovery of IFN-γ and Vaccine-Mediated Gene Interaction Networks , 2010, Journal of biomedicine & biotechnology.

[89]  Susan T. Dumais,et al.  Using latent semantic indexing for literature based discovery , 1998 .

[90]  Aaron Marcus,et al.  Seven HCI Grand Challenges , 2019, Int. J. Hum. Comput. Interact..

[91]  Roy Davies,et al.  The Creation of New Knowledge by Information Retrieval and Classification , 1989, J. Documentation.

[92]  Trevor Cohen,et al.  Discovery by scent: Discovery browsing system based on the Information Foraging Theory , 2012, 2012 IEEE International Conference on Bioinformatics and Biomedicine Workshops.

[93]  Aidong Zhang,et al.  Towards self‐learning based hypotheses generation in biomedical text domain , 2018, Bioinform..

[94]  Padmini Srinivasan,et al.  Text mining: Generating hypotheses from MEDLINE , 2004, J. Assoc. Inf. Sci. Technol..

[95]  M. Schuemie,et al.  Defining a Reference Set to Support Methodological Research in Drug Safety , 2013, Drug Safety.

[96]  Corrado Loglisci,et al.  Mining Generalized Association Rules on Biomedical Literature , 2005, IEA/AIE.

[97]  Hua Xu,et al.  Identifying Plausible Adverse Drug Reactions Using Knowledge Extracted from the Literature , 2014, AMIA.

[98]  Hongbao Cao,et al.  Advanced literature analysis in a Big Data world , 2017, Annals of the New York Academy of Sciences.

[99]  D. Swanson Literature-based Resurrection of Neglected Medical Discoveries , 2011, Journal of biomedical discovery and collaboration.

[100]  Tejas Shah,et al.  LION LBD: a literature-based discovery system for cancer biology , 2018, Bioinform..

[101]  Aidong Zhang,et al.  A survey on literature based discovery approaches in biomedical domain , 2019, J. Biomed. Informatics.

[102]  Neil R. Smalheiser,et al.  A feature representation method for biomedical scientific data based on composite text description , 2017 .

[103]  Trevor Cohen,et al.  Classification-by-Analogy: Using Vector Representations of Implicit Relationships to Identify Plausibly Causal Drug/Side-effect Relationships , 2016, AMIA.

[104]  Na Hong,et al.  Structuring the Chinese disjointed literature-based knowledge discovery system: The key technologies to success , 2012, J. Inf. Sci..

[105]  Li Wang,et al.  ARN: analysis and prediction by adipogenic professional database , 2016, BMC Systems Biology.

[106]  Wanda Pratt,et al.  A new evaluation methodology for literature-based discovery systems , 2009, J. Biomed. Informatics.

[107]  Carol Friedman,et al.  Exploiting Semantic Relations for Literature-Based Discovery , 2006, AMIA.

[108]  Joyce A. Mitchell,et al.  Improving Literature Based Discovery Support by Genetic Knowledge Integration , 2003, MIE.

[109]  Javed Mostafa,et al.  Discovering implicit associations among critical biological entities , 2009, Int. J. Data Min. Bioinform..

[110]  D. Swanson Migraine and Magnesium: Eleven Neglected Connections , 2015, Perspectives in biology and medicine.

[111]  Borut Peterlin,et al.  Integration of Data from Omic Studies with the Literature-Based Discovery towards Identification of Novel Treatments for Neovascularization in Diabetic Retinopathy , 2013, BioMed research international.

[112]  Hongfang Liu,et al.  A new method for prioritizing drug repositioning candidates extracted by literature-based discovery , 2015, 2015 IEEE International Conference on Bioinformatics and Biomedicine (BIBM).

[113]  Thomas C. Rindflesch,et al.  Link Prediction on the Semantic MEDLINE Network - An Approach to Literature-Based Discovery , 2014, Discovery Science.

[114]  S. Baek,et al.  Enriching plausible new hypothesis generation in PubMed , 2017, PloS one.

[115]  Wanda Pratt,et al.  H.3.3 Information Search and Retrieval , 2022 .

[116]  Michael D. Gordon,et al.  Toward Discovery Support Systems: A Replication, Re-Examination, and Extension of Swanson's Work on Literature-Based Discovery of a Connection between Raynaud's and Fish Oil , 1996, J. Am. Soc. Inf. Sci..

[117]  Johannes Stegmann,et al.  Hypothesis generation guided by co-word clustering , 2004, Scientometrics.

[118]  William M. Pottenger,et al.  Recent Advances in Literature Based Discovery , 2005 .

[119]  Aidong Zhang,et al.  Generating Medical Hypotheses Based on Evolutionary Medical Concepts , 2017, 2017 IEEE International Conference on Data Mining (ICDM).

[120]  Peter Bruza,et al.  Towards Operational Abduction from a Cognitive Perspective , 2006, Log. J. IGPL.

[121]  D. Swanson Fish Oil, Raynaud's Syndrome, and Undiscovered Public Knowledge , 2015, Perspectives in biology and medicine.

[122]  Margo I. Seltzer,et al.  Mining the Web for Medical Hypotheses - A Proof-of-Concept System , 2012, HEALTHINF.

[123]  N. Smalheiser,et al.  Mammalian Argonaute-DNA binding? , 2014, Biology Direct.

[124]  Hongfei Lin,et al.  Supervised Learning Based Hypothesis Generation from Biomedical Literature , 2015, BioMed research international.

[125]  Gang Wang,et al.  New insight into genes in association with asthma: literature‐based mining and network centrality analysis , 2013, Chinese medical journal.

[126]  Mark Stevenson,et al.  Quantifying and filtering knowledge generated by literature based discovery , 2017, BMC Bioinformatics.

[127]  Min Song,et al.  Entitymetrics: Measuring the Impact of Entities , 2013, PloS one.

[128]  Judita Preiss Seeking Informativeness in Literature Based Discovery , 2014, BioNLP@ACL.

[129]  Neil R. Smalheiser,et al.  The Place of Literature-Based Discovery in Contemporary Scientific Practice , 2008 .

[130]  Neil R. Smalheiser,et al.  A Quantitative Model for Linking Two Disparate Sets of Articles in Medline , 2022 .

[131]  Han Zhang,et al.  Networks of neuroinjury semantic predications to identify biomarkers for mild traumatic brain injury , 2015, J. Biomed. Semant..

[132]  Neil R. Smalheiser,et al.  Artificial Intelligence An interactive system for finding complementary literatures : a stimulus to scientific discovery , 1995 .

[133]  Won Chul Kim,et al.  A Bird's-Eye View of Alzheimer's Disease Research: Reflecting Different Perspectives of Indexers, Authors, or Citers in Mapping the Field. , 2015, Journal of Alzheimer's disease : JAD.

[134]  Xiaofeng Wang,et al.  Mining hidden connections among biomedical concepts from disjoint biomedical literature sets through semantic-based association rule , 2010 .

[135]  I. V. Ramakrishnan,et al.  Automated Suggestion of Tests for Identifying Likelihood of Adverse Drug Events , 2014 .

[136]  D. Ying,et al.  Upregulation of Endogenous HMOX1 Expression by a Computer-Designed Artificial Transcription Factor , 2010, Journal of biomedicine & biotechnology.

[137]  Bridget T. McInnes,et al.  Literature Based Discovery: Models, methods, and trends , 2017, J. Biomed. Informatics.

[138]  Yong Hwan Kim,et al.  A context-based ABC model for literature-based discovery , 2019, PloS one.

[139]  Hongfang Liu,et al.  Prioritizing Adverse Drug Reaction and Drug Repositioning Candidates Generated by Literature-Based Discovery , 2016, BCB.

[140]  N. Smalheiser Rediscovering Don Swanson:The Past, Present and Future of Literature-based Discovery , 2017, J. Data Inf. Sci..

[141]  Doheon Lee,et al.  MKEM: a multi-level knowledge emergence model for mining undiscovered public knowledge , 2009, DTMBIO.

[142]  Pietro Liò,et al.  Improving Literature-Based Discovery with Advanced Text Mining , 2014, CIBB.