Connecting the Dots: Hypotheses Generation by Leveraging Semantic Shifts

Literature-based Discovery (LBD) (a.k.a. Hypotheses Generation) is a systematic knowledge discovery process that elicit novel inferences about previously unknown scientific knowledge by rationally connecting complementary and non-interactive literature. Prompt identification of such novel knowledge is beneficial not only for researchers but also for various other stakeholders such as universities, funding bodies and academic publishers. Almost all the prior LBD research suffer from two major limitations. Firstly, the over-reliance of domain-dependent resources which restrict the models’ applicability to certain domains/problems. In this regard, we propose a generalisable LBD model that supports both cross-domain and cross-lingual knowledge discovery. The second persistent research deficiency is the mere focus of static snapshot of the corpus (i.e. ignoring the temporal evolution of topics) to detect the new knowledge. However, the knowledge in scientific literature changes dynamically and thus relying merely on static snapshot limits the model’s ability in capturing semantically meaningful connections. As a result, we propose a novel temporal model that captures semantic change of topics using diachronic word embeddings to unravel more accurate connections. The model was evaluated using the largest available literature repository to demonstrate the efficiency of the proposed cues towards recommending novel knowledge. Electronic supplementary material The online version of this chapter (10.1007/978-3-030-47436-2_25) contains supplementary material, which is available to authorized users.

[1]  Milan Stankovic,et al.  Discovering Relevant Topics Using DBPedia: Providing Non-obvious Recommendations , 2011, 2011 IEEE/WIC/ACM International Conferences on Web Intelligence and Intelligent Agent Technology.

[2]  Guangxu Xun,et al.  Hypothesis Generation From Text Based On Co-Evolution Of Biomedical Concepts , 2019, KDD.

[3]  Min Song,et al.  Entitymetrics: Measuring the Impact of Entities , 2013, PloS one.

[4]  Neil R. Smalheiser,et al.  Ranking indirect connections in literature-based discovery: The role of medical subject headings , 2006, J. Assoc. Inf. Sci. Technol..

[5]  Eu-Gene Siew,et al.  Learning the heterogeneous bibliographic information network for literature-based discovery , 2017, Knowl. Based Syst..

[6]  Simone Paolo Ponzetto,et al.  DBpedia Domains: augmenting DBpedia with domain information , 2014, LREC.

[7]  Jens Lehmann,et al.  DBpedia - A large-scale, multilingual knowledge base extracted from Wikipedia , 2015, Semantic Web.

[8]  Xiaodan Zhang,et al.  Mining Biomedical Knowledge Using Mutual information ABC , 2011, 2011 IEEE International Conference on Granular Computing.

[9]  Jure Leskovec,et al.  Cultural Shift or Linguistic Drift? Comparing Two Computational Measures of Semantic Change , 2016, EMNLP.

[10]  D. Swanson,et al.  Calcium-independent phospholipase A2 and schizophrenia. , 1998, Archives of general psychiatry.

[11]  Eu-Gene Siew,et al.  Emerging approaches in literature-based discovery: techniques and performance review , 2017, The Knowledge Engineering Review.

[12]  Katrina Falkner,et al.  A Systematic Review on Literature-based Discovery , 2019, ACM Comput. Surv..

[13]  D. Swanson Migraine and Magnesium: Eleven Neglected Connections , 2015, Perspectives in biology and medicine.

[14]  Neil R. Smalheiser,et al.  A Quantitative Model for Linking Two Disparate Sets of Articles in Medline , 2022 .

[15]  Katrina E. Falkner,et al.  A systematic review on literature-based discovery workflow , 2019, PeerJ Comput. Sci..

[16]  Jure Leskovec,et al.  Diachronic Word Embeddings Reveal Statistical Laws of Semantic Change , 2016, ACL.

[17]  D. Swanson,et al.  Indomethacin and Alzheimer's disease , 1996, Neurology.

[18]  Jian Hu,et al.  Using Wikipedia knowledge to improve text classification , 2009, Knowledge and Information Systems.

[19]  Xiaofeng Wang,et al.  Mining hidden connections among biomedical concepts from disjoint biomedical literature sets through semantic‐based association rule , 2010, Int. J. Intell. Syst..

[20]  D. Swanson Medical literature as a potential source of new knowledge. , 1990, Bulletin of the Medical Library Association.

[21]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[22]  Aidong Zhang,et al.  Generating Medical Hypotheses Based on Evolutionary Medical Concepts , 2017, 2017 IEEE International Conference on Data Mining (ICDM).

[23]  D. Swanson Fish Oil, Raynaud's Syndrome, and Undiscovered Public Knowledge , 2015, Perspectives in biology and medicine.

[24]  Tom Heath,et al.  Linked Data: Evolving the Web into a Global Data Space , 2011, Linked Data.

[25]  Aidong Zhang,et al.  Concepts-Bridges: Uncovering Conceptual Bridges Based on Biomedical Concept Evolution , 2018, KDD.

[26]  Ian H. Witten,et al.  Mining Domain-Specific Thesauri from Wikipedia: A Case Study , 2006, 2006 IEEE/WIC/ACM International Conference on Web Intelligence (WI 2006 Main Conference Proceedings)(WI'06).

[27]  Jens Lehmann,et al.  DBpedia: A Nucleus for a Web of Open Data , 2007, ISWC/ASWC.

[28]  Shahadat Uddin,et al.  The optimal window size for analysing longitudinal networks , 2017, Scientific Reports.