A Suite of Natural Language Processing Tools Developed for the I2B2 Project

Textual medical records contain a wealth of information that needs to be extracted and / or indexed in order to be analyzed and interpreted by the automated tools. We have developed a collection of natural language processing (NLP) tools to extract various types of information from unstructured medical records. The generic NLP components, when assembled in pipelines and initialized with custom configuration parameters, become a powerful medical data mining instrument. We have successfully extracted such medical concepts as diagnoses, comorbidities, discharge medications, and smoking status from various types of medical records.