Automatic extraction of numerical values from unstructured data in EHRs

Clinical data recorded in modern EHRs are very rich, although their secondary use research and medical decision may be complicated (eg, missing and incorrect data, data spread over several clinical databases, information available only within unstructured narrative documents). We propose to address the issue related to the processing of narrative documents in order to detect and extract numerical values and to associate them with the corresponding concepts (or themes) and units. We propose to use a CRF supervised categorisation for the detection of segments (themes, numerical sequences and units) and a rules-based system for the association of these segments among them in order to build semantically meaningful sequences. The average results obtained are competitive (0.96 precision, 0.78 recall, and 0.86 F-measure) and we plan to use the system with larger clinical data.

[1]  M. Okada,et al.  [New response evaluation criteria in solid tumours-revised RECIST guideline (version 1.1)]. , 2009, Gan to kagaku ryoho. Cancer & chemotherapy.

[2]  Udo Hahn,et al.  Text mining: powering the database revolution , 2007, Nature.

[3]  R. Stockdale,et al.  Data Quality Information and Decision Making: A Healthcare Case Study , 2007 .

[4]  Rainer Röhrig,et al.  Secondary use of clinical data in healthcare providers - an overview on research, regulatory and ethical requirements. , 2012, Studies in health technology and informatics.

[5]  Sheila Leatherman,et al.  The OECD Health Care Quality Indicators Project: history and background. , 2006, International journal for quality in health care : journal of the International Society for Quality in Health Care.

[6]  Patrice Degoulet,et al.  Specification of business rules for the development of hospital alarm system: application to the pharmaceutical validation , 2008, MIE.

[7]  Helmut Schmidt,et al.  Probabilistic part-of-speech tagging using decision trees , 1994 .

[8]  J. Wyatt Medical informatics, artefacts or science? , 1996, Methods of information in medicine.

[9]  François Yvon,et al.  Practical Very Large Scale CRFs , 2010, ACL.

[10]  D. Rebholz-Schuhmann,et al.  Facts from Text—Is Text Mining Ready to Deliver? , 2005, PLoS biology.

[11]  Patrice Degoulet,et al.  Methodology of integration of a clinical data warehouse with a clinical information system: the HEGP case , 2010, MedInfo.

[12]  R Verma,et al.  Life cycle of a data warehousing project in healthcare. , 2001, Journal of healthcare information management : JHIM.