The annotation station: an open-source technology for annotating large biomedical databases

The authors present a new framework for annotating large databases of multichannel clinical data such as MIMIC II. MIMIC II is an ICU database which includes both regularly sampled (but often discontinuous) high rate data (such as ECG and BP waveforms) and low resolution data (such as waveform derived averages, lab results, medication changes, fluid balances, and nurse-verified signal values) which are often sparse, asynchronous and irregularly sampled. Because of the extremely rich high-dimensional nature of MIMIC II medical data, we require a vast quantity of labeled data in order to test and validate ICU decision-support algorithms. MIMIC II presents a new annotation challenge which cannot be met by currently existing annotation structures due to the heterogeneous data types and unavailability of data. We have constructed a hardware/software configuration known as the "annotation station", a quad-monitor, time synchronized, viewing tool which displays all of this data in an organized fashion. The software gives the user the opportunity to produce annotations in a practicable format that serve the goals of the MIMIC II project. The annotation structure must apply to all the numeric signals in MIMIC as well as nonnumeric data such as nursing notes, discharge summaries and patient histories. Furthermore, in order for the annotation framework to adequately represent the state of the patient to a human or machine, it must involve clinical coding using accepted medical lexicons and causal linkage of one annotation to another. This linkage is the basis of causal reasoning between significant events in different streams of the data. The annotations also include subjective expert assessments of a patient's hemodynamic state and trajectory. These assessments provide objective and subjective labels for assessing algorithms that track trends in the data with a view to producing intelligent alarms.

[1]  I Korhonen,et al.  Building the IMPROVE Data Library. , 1997, IEEE engineering in medicine and biology magazine : the quarterly magazine of the Engineering in Medicine & Biology Society.

[2]  M. Douglass,et al.  Computer-assisted de-identification of free text in the MIMIC II database , 2004, Computers in Cardiology, 2004.

[3]  R. Mark,et al.  Integrating Data, Models, and Reasoning in Critical Care , 2006, 2006 International Conference of the IEEE Engineering in Medicine and Biology Society.

[4]  G.D. Clifford,et al.  An open-source, interactive Java-based system for rapid encoding of significant events in the ICU using the unified medical language system , 2004, Computers in Cardiology, 2004.

[5]  B.M. Dawant,et al.  The SIMON project: model-based signal acquisition, analysis, and interpretation in intelligent patient monitoring , 1993, IEEE Engineering in Medicine and Biology Magazine.

[6]  R G Mark,et al.  MIMIC II: a massive temporal ICU patient database to support research in intelligent patient monitoring , 2002, Computers in Cardiology.

[7]  Jeffrey M. Hausdorff,et al.  Physionet: Components of a New Research Resource for Complex Physiologic Signals". Circu-lation Vol , 2000 .