论文信息 - Improving rare disease classification using imperfect knowledge graph

Improving rare disease classification using imperfect knowledge graph

Accurately recognizing rare diseases based on symptom description is a critical task. The lack of historical data for rare diseases poses a great challenge to machine learning-based approaches. In this study, we develop a text classification algorithm that represents a document as a combination of a “bag of words” and a “bag of knowledge terms where a “knowledge term” is a term shared between the document and the knowledge graph relevant to the disease classification task.

[1] Bin Liang,et al. CN-DBpedia: A Never-Ending Chinese Knowledge Extraction System , 2017, IEA/AIE.

[2] Hongfang Liu,et al. Leveraging Collaborative Filtering to Accelerate Rare Disease Diagnosis , 2017, AMIA.

[3] R. D. du Bois,et al. Rare Diseases , 1946, Handbook Integrated Care.

[4] Gideon S. Mann,et al. Learning from labeled features using generalized expectation criteria , 2008, SIGIR '08.

[5] Ole Winther,et al. Rare disease diagnosis: A review of web search, social media and large-scale data-mining approaches , 2015, Rare diseases.

[6] Hongfang Liu,et al. Utilization of Electronic Medical Records and Biomedical Literature to Support the Diagnosis of Rare Diseases Using Data Fusion and Collaborative Filtering Approaches , 2018, JMIR medical informatics.

[7] Thorsten Joachims,et al. Text Categorization with Support Vector Machines: Learning with Many Relevant Features , 1998, ECML.

[8] Hongfang Liu,et al. Incorporating Knowledge-Driven Insights into a Collaborative Filtering Model to Facilitate the Differential Diagnosis of Rare Diseases , 2018, AMIA.

[9] Nick Craswell. Mean Reciprocal Rank , 2009, Encyclopedia of Database Systems.

[10] Burr Settles,et al. Closing the Loop: Fast, Interactive Semi-Supervised Annotation With Queries on Features and Instances , 2011, EMNLP.

[11] Moni Naor,et al. Rank aggregation methods for the Web , 2001, WWW '01.

[12] Bernhard Schölkopf,et al. DiSMEC: Distributed Sparse Machines for Extreme Multi-label Classification , 2016, WSDM.

[13] Christopher Ré,et al. Snorkel: Rapid Training Data Creation with Weak Supervision , 2017, Proc. VLDB Endow..

[14] Praveen Paritosh,et al. Freebase: a collaboratively created graph database for structuring human knowledge , 2008, SIGMOD Conference.

[15] Venkatesh Balasubramanian,et al. Slice: Scalable Linear Extreme Classifiers Trained on 100 Million Labels for Related Searches , 2019, WSDM.

[16] Louis-Philippe Morency,et al. Multimodal Machine Learning: A Survey and Taxonomy , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[17] Evgeniy Gabrilovich,et al. Feature Generation for Text Categorization Using World Knowledge , 2005, IJCAI.

[18] David Sontag,et al. Learning a Health Knowledge Graph from Electronic Medical Records , 2017, Scientific Reports.

[19] Ole Winther,et al. FindZebra: A search engine for rare diseases , 2013, Int. J. Medical Informatics.

[20] Nitesh V. Chawla,et al. SMOTE: Synthetic Minority Over-sampling Technique , 2002, J. Artif. Intell. Res..

[21] Jiangjiang He,et al. China has officially released its first national list of rare diseases. , 2018, Intractable & rare diseases research.

[22] Jason Eisner,et al. Machine Learning with Annotator Rationales to Reduce Annotation Cost , 2008 .

[23] Seetha Hari,et al. Learning From Imbalanced Data , 2019, Advances in Computer and Electrical Engineering.

[24] Yiming Yang,et al. A Comparative Study on Feature Selection in Text Categorization , 1997, ICML.

[25] Hema Raghavan,et al. Active Learning with Feedback on Features and Instances , 2006, J. Mach. Learn. Res..

[26] Viktor de Boer,et al. The knowledge graph as the default data model for learning on heterogeneous knowledge , 2017, Data Sci..

[27] Stefanie Putkowski. The National Organization for Rare Disorders (NORD) , 2010 .

[28] Bartosz Krawczyk,et al. Learning from imbalanced data: open challenges and future directions , 2016, Progress in Artificial Intelligence.