Incorporating Knowledge-Driven Insights into a Collaborative Filtering Model to Facilitate the Differential Diagnosis of Rare Diseases

Rare diseases, although individually rare, collectively affect one in ten Americans. Because of their rarity, patients with rare diseases are typically left misdiagnosed or undiagnosed, which leads to a prolonged medical journey. The diagnosis pathway of a rare disease is highly dependent on the associated clinical phenotypes, i.e., the observable characteristics, at the physical, morphologic, or biochemical level, of an individual. In our previous study, we applied a collaborative filtering model on clinical data generated at Mayo Clinic to stratify patients into subgroups of rare diseases. Information mined from clinical data, however, usually contains a certain level of noise, such as occurrences of comorbidities, which could impact the accuracy of differential diagnosis. In this study, we sought to incorporate a knowledge-driven approach into collaborative filtering to optimize results learned from clinical data. Our results demonstrated an improvement in performance over pure data-driven approaches with the potential to facilitate the differential diagnosis of rare diseases.