Bridging The Evolving Semantics: A Data Driven Approach to Knowledge Discovery In Biomedicine

Recent progress in biological, medical and health-care technologies, and innovations in wearable sensors provide us with unprecedented opportunities to accumulate massive data to understand disease prognosis and develop personalized treatments and interventions. These massive data supplemented with rapid growth in computing infrastructure has enabled bio-medical researchers to perform more comprehensive experiments and detailed case-studies. At the same time, performing these experiments are not only monetarily expensive but also time consuming. Thus, there is a growing need to provide tools to the researchers that will allow them to pose queries that will assist them in focusing on interesting “hypotheses”. However, such a tool would require capabilities to derive inferences based on existing known relationship between medical concepts. In this paper, we tackle this problem as one of non-factoid question answering wherein we try to answer the user-post questions by leveraging both authoritative as well as social media posts. While the former provides us with well knowledge on well researched topics, the latter provides us with real-time feedback on variety of topics like adverse drug effect (ADE), symptoms-drug relationship, etc. The challenge with leveraging the authoritative sources to infer answers for non-factoid question lies in: (a) The effective navigation of the answer search-space for timely response to the queries, (b) Ranking the candidate answers derived in step-(a) to enable non-trivial and novel discoveries, and (c) Being robust to perform confirmatory as well as discovery type of tasks.

[1]  Guangxu Xun,et al.  InterHG: an Interpretable and Accurate Model for Hypothesis Generation , 2021, 2021 IEEE International Conference on Bioinformatics and Biomedicine (BIBM).

[2]  Kishlay Jha,et al.  Continual knowledge infusion into pre-trained biomedical language models , 2021, Bioinform..

[3]  Fenglong Ma,et al.  Multimodal Emergent Fake News Detection via Meta Neural Process Networks , 2021, KDD.

[4]  Kishlay Jha,et al.  Knowledge-Base Enriched Word Embeddings for Biomedical Domain , 2021, ArXiv.

[5]  Aidong Zhang,et al.  Continual representation learning for evolving biomedical bipartite networks , 2021, Bioinform..

[6]  Aidong Zhang,et al.  Correlation Networks for Extreme Multi-label Text Classification , 2020, KDD.

[7]  Guangxu Xun,et al.  Hypothesis Generation From Text Based On Co-Evolution Of Biomedical Concepts , 2019, KDD.

[8]  Aidong Zhang,et al.  A survey on literature based discovery approaches in biomedical domain , 2019, J. Biomed. Informatics.

[9]  Aidong Zhang,et al.  Topic Discovery for Biomedical Corpus Using MeSH Embeddings , 2019, 2019 IEEE EMBS International Conference on Biomedical & Health Informatics (BHI).

[10]  Aidong Zhang,et al.  MeSHProbeNet: a self-attentive probe net for MeSH indexing , 2019, Bioinform..

[11]  Aidong Zhang,et al.  Interpretable Word Embeddings for Medical Domain , 2018, 2018 IEEE International Conference on Data Mining (ICDM).

[12]  Fenglong Ma,et al.  EANN: Event Adversarial Neural Networks for Multi-Modal Fake News Detection , 2018, KDD.

[13]  Aidong Zhang,et al.  Concepts-Bridges: Uncovering Conceptual Bridges Based on Biomedical Concept Evolution , 2018, KDD.

[14]  Aidong Zhang,et al.  Towards self‐learning based hypotheses generation in biomedical text domain , 2018, Bioinform..

[15]  Aidong Zhang,et al.  Augmenting word embeddings through external knowledge-base for biomedical application , 2017, 2017 IEEE International Conference on Big Data (Big Data).

[16]  Aidong Zhang,et al.  Generating Medical Hypotheses Based on Evolutionary Medical Concepts , 2017, 2017 IEEE International Conference on Data Mining (ICDM).

[17]  Wei Jin,et al.  Mining Novel Knowledge from Biomedical Literature using Statistical Measures and Domain Knowledge , 2016, BCB.

[18]  Erhard Rahm,et al.  Evolution of biomedical ontologies and mappings: Overview of recent approaches , 2016, Computational and structural biotechnology journal.

[19]  Wei Jin,et al.  Mining Hidden Knowledge from the Counterterrorism Dataset Using Graph-Based Approach , 2016, NLDB.

[20]  Jure Leskovec,et al.  Diachronic Word Embeddings Reveal Statistical Laws of Semantic Change , 2016, ACL.

[21]  Wei Jin,et al.  Discovering Semantic Relationships between Concepts from MEDLINE , 2016, 2016 IEEE Tenth International Conference on Semantic Computing (ICSC).

[22]  D. Swanson Migraine and Magnesium: Eleven Neglected Connections , 2015, Perspectives in biology and medicine.

[23]  D. Swanson Fish Oil, Raynaud's Syndrome, and Undiscovered Public Knowledge , 2015, Perspectives in biology and medicine.

[24]  Omer Levy,et al.  Neural Word Embedding as Implicit Matrix Factorization , 2014, NIPS.

[25]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[26]  Peter J. Haas,et al.  Automated hypothesis generation based on mining scientific literature , 2014, KDD.

[27]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[28]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[29]  T. Rindflesch,et al.  A closed literature-based discovery technique finds a mechanistic link between hypogonadism and diminished sleep quality in aging men. , 2012, Sleep.

[30]  Wanda Pratt,et al.  A new evaluation methodology for literature-based discovery systems , 2009, J. Biomed. Informatics.

[31]  Martin Theobald,et al.  Extraction of Conditional Probabilities of the Relationships Between Drugs, Diseases, and Genes from PubMed Guided by Relationships in PharmGKB , 2009, Summit on translational bioinformatics.

[32]  D. Chaussabel,et al.  Mining microarray expression data by literature profiling , 2002, Genome Biology.

[33]  Neil R. Smalheiser,et al.  Information discovery from complementary literatures: Categorizing viruses as potential weapons , 2001, J. Assoc. Inf. Sci. Technol..

[34]  Haym Hirsh,et al.  Exploiting Background Information in Knowledge Discovery from Text , 1997, Journal of Intelligent Information Systems.

[35]  Jeffrey C. Erlich,et al.  Increased phospholipid breakdown in schizophrenia. Evidence for the involvement of a calcium-independent phospholipase A2. , 1997, Archives of general psychiatry.

[36]  C. Kuo,et al.  Deficiency of vitamin E and selenium enhances calcium-independent phospholipase A2 activity in rat lung and liver. , 1995, The Journal of nutrition.

[37]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[38]  Yoshua Bengio,et al.  Neural Probabilistic Language Models , 2006 .

[39]  Padmini Srinivasan,et al.  Mining MEDLINE for implicit links between dietary substances and diseases , 2004, ISMB/ECCB.