Inference from Structured and Unstructured Electronic Medical Data for Dementia Detection

The prevalence of Alzheimer’s disease (AD) and other forms of dementia is increasing with the aging population, both in the United States and around the globe. The inability to cure these conditions results in prolonged and expensive medical care. Early detection is critical to potentially postpone symptoms and to prepare both healthcare providers and families for patients’ future needs. Current detection methods are typically costly or unreliable, and much stands to benefit from improved recognition of early AD markers. Electronic patient records provide the potential for computational analysis and prediction of complex diseases like AD. Prior work on this problem has focused mainly on structured data (e.g. test results), whereas this study aims to integrate structured and unstructured (e.g. clinical notes) data, obtained from the Alzheimer’s Disease Neuroimaging Initiative (ADNI)*, for classification of subjects’ dementia status. Prediction based on unstructured data alone performs with accuracy similar to that of prediction based on structured data that exclude cognitive markers. Integration of the structured and unstructured models provides performance improvements over either in isolation. Additionally, we provide insights into which structured features were more useful for classification of AD, supporting previously observed trends, while also highlighting the potential for computational methods to discover new clinical markers.

[1]  Carol Friedman,et al.  Identification of findings suspicious for breast cancer based on natural language processing of mammogram reports , 1997, AMIA.

[2]  H. Wiśniewski,et al.  Abnormal phosphorylation of the microtubule-associated protein? (tau) in Alzheimer cytoskeletal pathology , 1987 .

[3]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[4]  Richard L. Doty,et al.  Combining Early Markers Strongly Predicts Conversion from Mild Cognitive Impairment to Alzheimer's Disease , 2008, Biological Psychiatry.

[5]  Timothy J. Schmoke An Optimization-based approach for vaccine prioritization , 2013 .

[6]  John F. Hurdle,et al.  Automated identification of adverse events related to central venous catheters , 2007, J. Biomed. Informatics.

[7]  Christopher M. Bishop,et al.  Pattern Recognition and Machine Learning (Information Science and Statistics) , 2006 .

[8]  D. Hogan,et al.  Predicting Who Will Develop Dementia in a Cohort of Canadian Seniors , 2000, Canadian Journal of Neurological Sciences / Journal Canadien des Sciences Neurologiques.

[9]  S. Mani,et al.  Extracting and integrating data from entire electronic health records for detecting colorectal cancer cases. , 2011, AMIA ... Annual Symposium proceedings. AMIA Symposium.

[10]  Michel Goedert,et al.  Tau pathology and neurodegeneration , 2013, The Lancet Neurology.

[11]  Gary King,et al.  Amelia II: A Program for Missing Data , 2011 .

[12]  Kingshuk Roy Choudhury,et al.  Predicting cognitive decline in subjects at risk for Alzheimer disease by using combined cerebrospinal fluid, MR imaging, and PET biomarkers. , 2013, Radiology.

[13]  M. Prince,et al.  World Alzheimer Report 2013 , 2014 .

[14]  Peter J. Haug,et al.  Research Paper: Automatic Detection of Acute Bacterial Pneumonia from Chest X-ray Reports , 2000, J. Am. Medical Informatics Assoc..

[15]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[16]  Steven H. Brown,et al.  Automated identification of postoperative complications within an electronic medical record using natural language processing. , 2011, JAMA.