DSR: A Collection for the Evaluation of Graded Disease-Symptom Relations

The effective extraction of ranked disease-symptom relationships is a critical component in various medical tasks, including computer-assisted medical diagnosis or the discovery of unexpected associations between diseases. While existing disease-symptom relationship extraction methods are used as the foundation in the various medical tasks, no collection is available to systematically evaluate the performance of such methods. In this paper, we introduce the Disease-Symptom Relation Collection (dsr-collection), created by five physicians as expert annotators. We provide graded symptom judgments for diseases by differentiating between relevant symptoms and primary symptoms. Further, we provide several strong baselines, based on the methods used in previous studies. The first method is based on word embeddings, and the second on co-occurrences of MeSH-keywords of medical articles. For the co-occurrence method, we propose an adaption in which not only keywords are considered, but also the full text of medical articles. The evaluation on the dsr-collection shows the effectiveness of the proposed adaption in terms of nDCG, precision, and recall.

[1]  Jaana Kekäläinen,et al.  Cumulated gain-based evaluation of IR techniques , 2002, TOIS.

[2]  Ernestina Menasalvas Ruiz,et al.  Evaluating Wikipedia as a Source of Information for Disease Understanding , 2018, 2018 IEEE 31st International Symposium on Computer-Based Medical Systems (CBMS).

[3]  Xiang Zhang,et al.  Automated Medical Diagnosis by Ranking Clusters Across the Symptom-Disease Network , 2017, 2017 IEEE International Conference on Data Mining (ICDM).

[4]  J. R. Landis,et al.  The measurement of observer agreement for categorical data. , 1977, Biometrics.

[5]  J. Fleiss Measuring nominal scale agreement among many raters. , 1971 .

[6]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[7]  Romuald Houdré,et al.  Correction: Corrigendum: All-optical polariton transistor , 2014, Nature Communications.

[8]  Ying Shen,et al.  Enhancing ontology-driven diagnostic reasoning with a symptom-dependency-aware Naïve Bayes classifier , 2019, BMC Bioinformatics.

[9]  Jaana Kekäläinen,et al.  Binary and graded relevance in IR evaluations--Comparison of the effects on ranking of IR systems , 2005, Inf. Process. Manag..

[10]  Alan R. Aronson,et al.  Effective mapping of biomedical text to the UMLS Metathesaurus: the MetaMap program , 2001, AMIA.

[11]  Thierry Charnois,et al.  Symptom extraction issue , 2014, BioNLP@ACL.

[12]  Setu Shah,et al.  Neural networks for mining the associations between diseases and symptoms in clinical notes , 2018, Health Inf. Sci. Syst..

[13]  Yannick Toussaint,et al.  Extracting Disease-Symptom Relationships by Learning Syntactic Patterns from Dependency Graphs , 2015, BioNLP@IJCNLP.

[14]  A. Barabasi,et al.  Human symptoms–disease network , 2014, Nature Communications.

[15]  Ke Wang,et al.  Mining Disease-Symptom Relation from Massive Biomedical Literature and Its Application in Severe Disease Diagnosis , 2018, AMIA.