Medical information extraction in European Portuguese

The electronic storage of medical patient data is becoming a daily experience in most of the practices and hospitals worldwide. However, much of the available data is in free text form, a convenient way of expressing concepts and events but especially challenging if one wants to perform automatic searches, summarization or statistical analyses. Information Extraction can relieve some of these problems by offering a semantically informed interpretation and abstraction of the texts. MedInX, the Medical Information eXtraction system developed in the context of this PhD dissertation is designed to process textual clinical discharge records written in Portuguese and to perform automatic and accurate mapping of free text reports onto a structured representation. MedInX components are based on Natural Language Processing principles, and provide several mechanisms to read, process and utilize external resources, such as terminologies and ontologies. MedInX current practical applications include automatic code assignment and an audit system capable of systematically analyze the content and completeness of the clinical reports. The evaluation of the system on a set of authentic patient discharge letters indicate that the system performs with 95% precision and recall.

[1]  Henrik Eriksson,et al.  The evolution of Protégé: an environment for knowledge-based systems development , 2003, Int. J. Hum. Comput. Stud..

[2]  Ilídio Castro Oliveira,et al.  Integration Services to Enable Regional Shared Electronic Health Records , 2011, MIE.

[3]  Koby Crammer,et al.  Automatic Code Assignment to Medical Text , 2007, BioNLP@ACL.

[4]  Alexiei Dingli,et al.  Multi-strategy definition of annotation services in Melita , 2003 .

[5]  Hanna Suominen Machine Learning and Clinical Text. Supporting Health Information Flow , 2009 .

[6]  Paul Buitelaar,et al.  Ontology-based Information Extraction with SOBA , 2006, LREC.

[7]  S Corcoran-Perry,et al.  Developing clinical practice environments supporting the knowledge work of nurses. , 2001, Computers in nursing.

[8]  Tim Berners-Lee,et al.  The World-Wide Web , 1994, CACM.

[9]  Kent A Spackman,et al.  SNOMED CT milestones: endorsements are added to already-impressive standards credentials. , 2004, Healthcare informatics : the business magazine for information and communication systems.

[10]  Lynette Hirschman,et al.  MITRE: Description of the Alembic System Used for MUC-6 , 1995, MUC.

[11]  Jerry R. Hobbs The Generic Information Extraction System , 1993, MUC.

[12]  Olivier Bodenreider,et al.  From indexing the biomedical literature to coding clinical text: experience with MTI and machine learning approaches , 2007, BioNLP@ACL.

[13]  Krzysztof J. Cios,et al.  Uniqueness of medical data mining , 2002, Artif. Intell. Medicine.

[14]  Serguei V. S. Pakhomov,et al.  High Throughput Modularized NLP System for Clinical Text , 2005, ACL.

[15]  Allen C. Browne,et al.  The Role of Lexical Knowledge in Biomedical Text Understanding. , 1987 .

[16]  F. Jones,et al.  International Classification of Diseases , 1978 .

[17]  Martin Romacker,et al.  Creating Knowledge Repositories from Biomedical Reports: The MEDSYNDIKATE Text Mining System , 2001, Pacific Symposium on Biocomputing.

[18]  Jerry R. Hobbs Information extraction from biomedical text , 2002, J. Biomed. Informatics.

[19]  David A. Ferrucci,et al.  UIMA: an architectural approach to unstructured information processing in the corporate research environment , 2004, Natural Language Engineering.

[20]  J Starren,et al.  Architectural requirements for a multipurpose natural language processor in the clinical environment. , 1995, Proceedings. Symposium on Computer Applications in Medical Care.

[21]  David W. Embley,et al.  Ontology-based extraction and structuring of information from data-rich unstructured documents , 1998, CIKM '98.

[22]  Christian Lovis,et al.  Power of expression in the electronic patient record: structured data or narrative text? , 2000, Int. J. Medical Informatics.