One of the research projects running at the medical informatics department of the Institute of Computer Science AS CR explores the problem of medical information representation and development of electronic health record (EHR). With respect to this effort an interesting problem arises: how to transfer knowledge from a medical record written in a free text form into a structured electronic format represented by the EHR. Currently, this task was solved by writing extraction rules (regular expressions) for every element of information that is to be extracted from the medical record. However, such approach is very time consuming and requires supervision of a skilled programmer whenever the target area of medicine is changed. In this article we explore the possibility to mechanize this process by automatically generating the extraction rules from a pre-annotated corpus of medical records. Since we are currently in the phase of data acquisition and preliminary tests we will not present any final results, rather we will sketch the technologies we intend to use and describe the tools that were developed so far as a part of this project.
[1]
Raymond J. Mooney,et al.
Relational Learning of Pattern-Match Rules for Information Extraction
,
1999,
CoNLL.
[2]
Fabio Ciravegna,et al.
(LP) 2 , an Adaptive Algorithm for Information Extraction from Web-related Texts
,
2001
.
[3]
Stephen Soderland,et al.
Learning Information Extraction Rules for Semi-Structured and Free Text
,
1999,
Machine Learning.
[4]
Jan Hajic.
Disambiguation of Rich Inflection - Computational Morphology of Czech
,
2004
.
[5]
David A. Campbell,et al.
Comparing syntactic complexity in medical and non-medical corpora
,
2001,
AMIA.
[6]
Fabio Ciravegna,et al.
Adaptive Information Extraction from Text by Rule Induction and Generalisation
,
2001,
IJCAI.
[7]
Dayne Freitag,et al.
Multistrategy Learning for Information Extraction
,
1998,
ICML.