MedTruth: A Semi-supervised Approach to Discovering Knowledge Condition Information from Multi-Source Medical Data

Knowledge Graph (KG) contains entities and the relations between entities. Due to its representation ability, KG has been successfully applied to support many medical/healthcare tasks. However, in the medical domain, knowledge holds under certain conditions. Such conditions for medical knowledge are crucial for decision-making in various medical applications, which is missing in existing medical KGs. In this paper, we aim to discovery medical knowledge conditions from texts to enrich KGs. Electronic Medical Records (EMRs) are systematized collection of clinical data and contain detailed information about patients, thus EMRs can be a good resource to discover medical knowledge conditions. Unfortunately, the amount of available EMRs is limited due to reasons such as regularization. Meanwhile, a large amount of medical question answering (QA) data is available, which can greatly help the studied task. However, the quality of medical QA data is quite diverse, which may degrade the quality of the discovered medical knowledge conditions. In the light of these challenges, we propose a new truth discovery method, MedTruth, for medical knowledge condition discovery, which incorporates prior source quality information into the source reliability estimation procedure, and also utilizes the knowledge triple information for trustworthy information computation. We conduct series of experiments on real-world medical datasets to demonstrate that the proposed method can discover meaningful and accurate conditions for medical knowledge by leveraging both EMR and QA data. Further, the proposed method is tested on synthetic datasets to validate its effectiveness under various scenarios.

[1]  Lora Aroyo,et al.  Crowdsourcing Ground Truth for Medical Relation Extraction , 2017, ACM Trans. Interact. Intell. Syst..

[2]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[3]  Shi-Hua Zhang,et al.  DrugE-Rank: improving drug–target interaction prediction of new candidate drugs or targets by ensemble learning to rank , 2016, Bioinform..

[4]  Lei Chen,et al.  Subjective Knowledge Base Construction Powered By Crowdsourcing and Knowledge Base , 2018, SIGMOD Conference.

[5]  Philip S. Yu,et al.  On the Generative Discovery of Structured Medical Knowledge , 2018, KDD.

[6]  DumitracheAnca,et al.  Crowdsourcing Ground Truth for Medical Relation Extraction , 2018 .

[7]  Gerhard Weikum,et al.  YAGO2: exploring and querying world knowledge in time, space, context, and many languages , 2011, WWW.

[8]  Xueqi Cheng,et al.  Truth Discovery by Claim and Source Embedding , 2017, IEEE Transactions on Knowledge and Data Engineering.

[9]  Heiner Stuckenschmidt,et al.  Marrying Uncertainty and Time in Knowledge Graphs , 2017, AAAI.

[10]  Chao Zhao,et al.  Learning and inference in knowledge-based probabilistic model for medical diagnosis , 2017, Knowl. Based Syst..

[11]  Xiang Zhang,et al.  Automated Medical Diagnosis by Ranking Clusters Across the Symptom-Disease Network , 2017, 2017 IEEE International Conference on Data Mining (ICDM).

[12]  Yang Deng,et al.  IDDAT: An Ontology-Driven Decision Support System for Infectious Disease Diagnosis and Therapy , 2018, 2018 IEEE International Conference on Data Mining Workshops (ICDMW).

[13]  Charles Jochim,et al.  Named Entity Recognition in the Medical Domain with Constrained CRF Models , 2017, EACL.

[14]  Philip S. Yu,et al.  Truth Discovery with Multiple Conflicting Information Providers on the Web , 2007, IEEE Transactions on Knowledge and Data Engineering.

[15]  Olivier Ferret,et al.  Neural Architecture for Temporal Relation Extraction: A Bi-LSTM Approach for Detecting Narrative Containers , 2017, ACL.

[16]  Xiaoxin Yin,et al.  Semi-supervised truth discovery , 2011, WWW.

[17]  Bo Zhao,et al.  Resolving conflicts in heterogeneous data by truth discovery and source reliability estimation , 2014, SIGMOD Conference.

[18]  Maosong Sun,et al.  Does William Shakespeare REALLY Write Hamlet? Knowledge Representation Learning with Confidence , 2017, AAAI.

[19]  Wei Zhang,et al.  Knowledge vault: a web-scale approach to probabilistic knowledge fusion , 2014, KDD.

[20]  Heng Ji,et al.  FaitCrowd: Fine Grained Truth Discovery for Crowdsourced Data Aggregation , 2015, KDD.

[21]  Praveen Paritosh,et al.  Freebase: a collaboratively created graph database for structuring human knowledge , 2008, SIGMOD Conference.

[22]  Daisy Zhe Wang,et al.  Knowledge expansion over probabilistic knowledge bases , 2014, SIGMOD Conference.

[23]  Divesh Srivastava,et al.  Truth Finding on the Deep Web: Is the Problem Solved? , 2012, Proc. VLDB Endow..

[24]  Bo Zhao,et al.  A Survey on Truth Discovery , 2015, SKDD.

[25]  Divesh Srivastava,et al.  Less is More: Selecting Sources Wisely for Integration , 2012, Proc. VLDB Endow..

[26]  Clement T. Yu,et al.  T-verifier: Verifying truthfulness of fact statements , 2011, 2011 IEEE 27th International Conference on Data Engineering.

[27]  Hong Yu,et al.  Bidirectional RNN for Medical Event Detection in Electronic Health Records , 2016, NAACL.

[28]  Dan Roth,et al.  Knowing What to Believe (when you already know something) , 2010, COLING.

[29]  Jens Lehmann,et al.  DBpedia - A large-scale, multilingual knowledge base extracted from Wikipedia , 2015, Semantic Web.

[30]  David S. Wishart,et al.  DrugBank 4.0: shedding new light on drug metabolism , 2013, Nucleic Acids Res..

[31]  Cheng Li,et al.  Hierarchical Bayesian nonparametric models for knowledge discovery from electronic medical records , 2016, Knowl. Based Syst..

[32]  Beng Chin Ooi,et al.  Online data fusion , 2011, Proc. VLDB Endow..

[33]  Wei Fan,et al.  Reliable Medical Diagnosis from Crowdsourcing: Discover Trustworthy Answers from Non-Experts , 2017, WSDM.

[34]  Nagiza F. Samatova,et al.  Learning Entity Type Embeddings for Knowledge Graph Completion , 2017, CIKM.