论文信息 - Empirical Ontologies for Cohort Identification

Empirical Ontologies for Cohort Identification

The growth of patient data stored in Electronic Medical Records (EMR) has greatly expanded the potential for the evidence-based improvement of clinical practice. The proper re-use of this clinical information, however, does not replace basic research techniques — it augments them. The Text REtrieval Conference 2011 Medical Records Track explored how information retrieval may support clinical research by providing an efficient means to identify cohorts for clinical studies. Mayo Clinic NLP’s submission to the TREC Medical Records track attempts information retrieval at a semantic level, combining two disparate means of computing clinical semantics. Substantial effort has gone into the development of precise semantic specification of concepts in medical ontologies and terminologies[1, ?]. But human clinicians do not generate clinical text by referring to such resources, and ontology creators do not base their terminology design on clinical text — so the distribution of ontology concepts in actual clinical texts may differ greatly. Therefore, in representing clinical reports for cohort identification, we advocate for a model that makes use of expert knowledge, is empirically validated, and considers context. This is accomplished through a new framework: empirical ontologies. Patient cohort identification is thus a practical use case for the techniques in our recent work on clinical concept frequency comparisons[2, 3]. The rest of this paper describes the TREC 2011 Medical Records task, describes Mayo Clinic’s run submissions, and reports evaluation results with subsequent discussion.

[1] D. Lindberg,et al. Unified Medical Language System , 2020, Definitions.

[2] Sunghwan Sohn,et al. Mayo clinical Text Analysis and Knowledge Extraction System (cTAKES): architecture, component evaluation and applications , 2010, J. Am. Medical Informatics Assoc..

[3] Hongfang Liu,et al. Semantic characteristics of NLP-extracted concepts in clinical notes vs. biomedical literature. , 2011, AMIA ... Annual Symposium proceedings. AMIA Symposium.

[4] Cui Tao,et al. Unified Medical Language System term occurrences in clinical notes: a large-scale corpus analysis , 2012, J. Am. Medical Informatics Assoc..

[5] Alfred V. Aho,et al. Efficient string matching , 1975, Commun. ACM.