论文信息 - Identifying Segment Topics in Medical Dictations

Identifying Segment Topics in Medical Dictations

In this paper, we describe the use of lexical and semantic features for topic classification in dictated medical reports. First, we employ SVM classification to assign whole reports to coarse work-type categories. Afterwards, text segments and their topic are identified in the output of automatic speech recognition. This is done by assigning work-type-specific topic labels to each word based on features extracted from a sliding context window, again using SVM classification utilizing semantic features. Classifier stacking is then used for a posteriori error correction, yielding a further improvement in classification accuracy.

Jeremy Jancsary | Harald Trost | Johannes Matiasek | Alexandra Klein

[1] D. Lindberg,et al. The Unified Medical Language System , 1993, Yearbook of Medical Informatics.

[2] Jeremy Jancsary,et al. Mismatch interpretation by semantics-driven alignment ∗ , 2006 .

[3] Chih-Jen Lin,et al. LIBLINEAR: A Library for Large Linear Classification , 2008, J. Mach. Learn. Res..

[4] Chih-Jen Lin,et al. LIBSVM: A library for support vector machines , 2011, TIST.

[5] Thorsten Joachims,et al. Text Categorization with Support Vector Machines: Learning with Many Relevant Features , 1998, ECML.

[6] Jeremy Jancsary,et al. Revealing the Structure of Medical Dictations with Conditional Random Fields , 2008, EMNLP.

[7] Christiane Fellbaum,et al. Book Reviews: WordNet: An Electronic Lexical Database , 1999, CL.

[8] Vladimir N. Vapnik,et al. The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[9] Jeremy Jancsary,et al. Semantics-based Automatic Literal Reconstruction Of Dictations , 2007 .

[10] L. Philips,et al. Hanging on the metaphone , 1990 .