On the Generative Discovery of Structured Medical Knowledge

Online healthcare services can provide the general public with ubiquitous access to medical knowledge and reduce medical information access cost for both individuals and societies. However, expanding the scale of high-quality yet structured medical knowledge usually comes with tedious efforts in data preparation and human annotation. To promote the benefits while minimizing the data requirement in expanding medical knowledge, we introduce a generative perspective to study the relational medical entity pair discovery problem. A generative model named Conditional Relationship Variational Autoencoder is proposed to discover meaningful and novel medical entity pairs by purely learning from the expression diversity in the existing relational medical entity pairs. Unlike discriminative approaches where high-quality contexts and candidate medical entity pairs are carefully prepared to be examined by the model, the proposed model generates novel entity pairs directly by sampling from a learned latent space without further data requirement. The proposed model explores the generative modeling capacity for medical entity pairs while incorporating deep learning for hands-free feature engineering. It is not only able to generate meaningful medical entity pairs that are not yet observed, but also can generate entity pairs for a specific medical relationship. The proposed model adjusts the initial representations of medical entities by addressing their relational commonalities. Quantitative and qualitative evaluations on real-world relational medical entity pairs demonstrate the effectiveness of the proposed method in generating relational medical entity pairs that are meaningful and novel.

[1]  Pierre Zweigenbaum,et al.  Automatic extraction of semantic relations between medical entities: a rule based approach , 2011, J. Biomed. Semant..

[2]  Sepp Hochreiter,et al.  Fast and Accurate Deep Network Learning by Exponential Linear Units (ELUs) , 2015, ICLR.

[3]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[4]  Luis Gravano,et al.  Snowball: extracting relations from large plain-text collections , 2000, DL '00.

[5]  Soumith Chintala,et al.  Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks , 2015, ICLR.

[6]  Yoshua Bengio,et al.  Understanding the difficulty of training deep feedforward neural networks , 2010, AISTATS.

[7]  Jason Weston,et al.  Translating Embeddings for Modeling Multi-relational Data , 2013, NIPS.

[8]  Alex Graves,et al.  DRAW: A Recurrent Neural Network For Image Generation , 2015, ICML.

[9]  Heng Ji,et al.  Heterogeneous Supervision for Relation Extraction: A Representation Learning Approach , 2017, EMNLP.

[10]  Ying Tan,et al.  Variational Autoencoder for Semi-Supervised Text Classification , 2017, AAAI.

[11]  Kai-Wei Chang,et al.  Typed Tensor Decomposition of Knowledge Bases for Relation Extraction , 2014, EMNLP.

[12]  Daniel Jurafsky,et al.  Distant supervision for relation extraction without labeled data , 2009, ACL.

[13]  Yoshua. Bengio,et al.  Learning Deep Architectures for AI , 2007, Found. Trends Mach. Learn..

[14]  Jun Zhao,et al.  Relation Classification via Convolutional Deep Neural Network , 2014, COLING.

[15]  Razvan Pascanu,et al.  A simple neural network module for relational reasoning , 2017, NIPS.

[16]  Li Guo,et al.  Knowledge Base Completion Using Embeddings and Rules , 2015, IJCAI.

[17]  Andrew McCallum,et al.  Structured Relation Discovery using Generative Models , 2011, EMNLP.

[18]  Philip S. Yu,et al.  Mining User Intentions from Medical Queries: A Neural Network Based Heterogeneous Jointly Modeling Approach , 2016, WWW.

[19]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[20]  Jiawei Han,et al.  MetaPAD: Meta Pattern Discovery from Massive Text Corpora , 2017, KDD.

[21]  Charu C. Aggarwal,et al.  When will it happen?: relationship prediction in heterogeneous information networks , 2012, WSDM '12.

[22]  Zhen Wang,et al.  Knowledge Graph and Text Jointly Embedding , 2014, EMNLP.

[23]  Honglak Lee,et al.  Learning Structured Output Representation using Deep Conditional Generative Models , 2015, NIPS.

[24]  Bo Zhao,et al.  Entity relation discovery from web tables and links , 2010, WWW '10.

[25]  Ying Tan,et al.  Multi-digit image synthesis using recurrent conditional variational autoencoder , 2016, 2016 International Joint Conference on Neural Networks (IJCNN).

[26]  Diego Marcheggiani,et al.  Discrete-State Variational Autoencoders for Joint Discovery and Factorization of Relations , 2016, TACL.

[27]  Savas Parastatidis,et al.  Automatic Discovery of Semantic Relations using MindNet , 2010, LREC.

[28]  Ole Winther,et al.  How to Train Deep Variational Autoencoders and Probabilistic Ladder Networks , 2016, ICML 2016.

[29]  Matthew D. Zeiler ADADELTA: An Adaptive Learning Rate Method , 2012, ArXiv.

[30]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[31]  Samy Bengio,et al.  Generating Sentences from a Continuous Space , 2015, CoNLL.

[32]  N. Chater,et al.  Précis of Bayesian Rationality: The Probabilistic Approach to Human Reasoning , 2009, Behavioral and Brain Sciences.

[33]  Andrew McCallum,et al.  Integrating Probabilistic Extraction Models and Data Mining to Discover Relations and Patterns in Text , 2006, NAACL.

[34]  Diederik P. Kingma,et al.  Stochastic Gradient VB and the Variational Auto-Encoder , 2013 .

[35]  Danqi Chen,et al.  Reasoning With Neural Tensor Networks for Knowledge Base Completion , 2013, NIPS.

[36]  Jennifer Neville,et al.  Using relational knowledge discovery to prevent securities fraud , 2005, KDD '05.

[37]  Yoram Singer,et al.  Adaptive Subgradient Methods for Online Learning and Stochastic Optimization , 2011, J. Mach. Learn. Res..

[38]  Heng Ji,et al.  Constrained Information-Theoretic Tripartite Graph Clustering to Identify Semantically Similar Relations , 2015, IJCAI.

[39]  Philip S. Yu,et al.  Bringing semantic structures to user intent detection in online medical queries , 2017, 2017 IEEE International Conference on Big Data (Big Data).

[40]  Zhiyuan Liu,et al.  Neural Relation Extraction with Selective Attention over Instances , 2016, ACL.

[41]  Hao Wu,et al.  Extracting Medical Knowledge from Crowdsourced Question Answering Website , 2020, IEEE Transactions on Big Data.

[42]  Ricardo A. Baeza-Yates,et al.  Extracting semantic relations from query logs , 2007, KDD '07.

[43]  Andrew McCallum,et al.  Generalizing to Unseen Entities and Entity Pairs with Row-less Universal Schema , 2016, EACL.

[44]  Sougata Mukherjea,et al.  Discovering semantic biomedical relations utilizing the Web , 2008, TKDD.

[45]  Zhe Gan,et al.  Variational Autoencoder for Deep Learning of Images, Labels and Captions , 2016, NIPS.

[46]  Jason Weston,et al.  Learning Structured Embeddings of Knowledge Bases , 2011, AAAI.

[47]  Tom M. Mitchell,et al.  Efficient and Expressive Knowledge Base Completion Using Subgraph Feature Extraction , 2015, EMNLP.

[48]  Geoffrey E. Hinton,et al.  Rectified Linear Units Improve Restricted Boltzmann Machines , 2010, ICML.

[49]  Yang Deng,et al.  Knowledge-aware Attentive Neural Network for Ranking Question Answer Pairs , 2018, SIGIR.