Automated tools for phenotype extraction from medical records.

Clinical research studying critical illness phenotypes relies on the identification of clinical syndromes defined by consensus definitions. Historically, identifying phenotypes has required manual chart review, a time and resource intensive process. The overall research goal of C ritical I llness PH enotype E xt R action (deCIPHER) project is to develop automated approaches based on natural language processing and machine learning that accurately identify phenotypes from EMR. We chose pneumonia as our first critical illness phenotype and conducted preliminary experiments to explore the problem space. In this abstract, we outline the tools we built for processing clinical records, present our preliminary findings for pneumonia extraction, and describe future steps.