论文信息 - Semantic and phonetic automatic reconstruction of medical dictations

Semantic and phonetic automatic reconstruction of medical dictations

Automatic speech recognition (ASR) has become a valuable tool in large document production environments like medical dictation. While manual post-processing is still needed for correcting speech recognition errors and for creating documents which adhere to various stylistic and formatting conventions, a large part of the document production process is carried out by the ASR system. For improving the quality of the system output, knowledge about the multi-layered relationship between the dictated texts and the final documents is required. Thus, typical speech-recognition errors can be avoided, and proper style and formatting can be anticipated in the ASR part of the document production process. Yet - while vast amounts of recognition results and manually edited final reports are constantly being produced - the error-free literal transcripts of the actually dictated texts are a scarce and costly resource because they have to be created by manually transcribing the audio files. To obtain large corpora of literal transcripts for medical dictation, we propose a method for automatically reconstructing them from draft speech-recognition transcripts plus the corresponding final medical reports. The main innovative aspect of our method is the combination of two independent knowledge sources: phonetic information for the identification of speech-recognition errors and semantic information for detecting post-editing concerning format and style. Speech recognition results and final reports are first aligned, then properly matched based on semantic and phonetic similarity, and finally categorised and selectively combined into a reconstruction hypothesis. This method can be used for various applications in language technology, e.g., adaptation for ASR, document production, or generally for the development of parallel text corpora of non-literal text resources. In an experimental evaluation, which also includes an assessment of the quality of the reconstructed transcripts compared to manual transcriptions, the described method results in a relative word error rate reduction of 7.74% after retraining the standard language model with reconstructed transcripts.

[1] Kimberly Voll,et al. A methodology of error detection: improving speech recognition in radiology , 2006 .

[2] Jean-Luc Gauvain,et al. Lightly supervised and unsupervised acoustic model training , 2002, Comput. Speech Lang..

[3] Alexander H. Waibel,et al. Unsupervised training of a speech recognizer: recent experiments , 1999, EUROSPEECH.

[4] Michael Hammond. Syllable parsing in English and French , 1995, ArXiv.

[5] Timothy J. Hazen. Automatic alignment and error correction of human generated transcripts for long speech recordings , 2006, INTERSPEECH.

[6] Eric Fosler-Lussier,et al. A framework for predicting speech recognition errors , 2005, Speech Commun..

[7] Karim Filali,et al. A Dynamic Bayesian Framework to Model Context and Memory in Edit Distance Learning: An Application to Pronunciation Classification , 2005, ACL.

[8] Olivier Bodenreider,et al. Comparing terms, concepts and semantic classes in WordNet and the Unified Medical Language System , 2001 .

[9] Dekang Lin,et al. WordNet: An Electronic Lexical Database , 1998 .

[10] Gernot Kubin,et al. Reconstructing Medical Dictations from Automatically Recognized and Non-Literal Transcripts with Phonetic Similarity Matching , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[11] D. Lindberg,et al. The Unified Medical Language System , 1993, Yearbook of Medical Informatics.

[12] Justin Zobel,et al. Phonetic string matching: lessons from information retrieval , 1996, SIGIR '96.

[13] Franz Pernkopf,et al. Language model adaptation for medical dictations by automatic phonetics-driven transcript reconstruction , 2008 .

[14] Christiane Fellbaum,et al. Book Reviews: WordNet: An Electronic Lexical Database , 1999, CL.

[15] Serguei V. S. Pakhomov,et al. Generating Training Data for Medical Dictations , 2001, NAACL.

[16] Jeremy Jancsary,et al. Semantics-based Automatic Literal Reconstruction Of Dictations , 2007 .

[17] Vladimir I. Levenshtein,et al. Binary codes capable of correcting deletions, insertions, and reversals , 1965 .

[18] Peder A. Olsen,et al. Theory and practice of acoustic confusability , 2002, Comput. Speech Lang..

[19] Radford M. Neal. Pattern Recognition and Machine Learning , 2007, Technometrics.

[20] Jochen Peters,et al. Transformation-based error correction for speech-to-text systems , 2004, INTERSPEECH.

[21] Bhuvana Ramabhadran,et al. A new method for OOV detection using hybrid word/fragment system , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[22] Peter N. Yianilos,et al. Learning String-Edit Distance , 1996, IEEE Trans. Pattern Anal. Mach. Intell..

[23] P. Smolensky,et al. Optimality Theory: Constraint Interaction in Generative Grammar , 2004 .

[24] Franz Pernkopf,et al. Automatic phonetics-driven reconstruction of medical dictations on multiple levels of segmentation , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[25] Jeremy Jancsary,et al. Mismatch interpretation by semantics-driven alignment ∗ , 2006 .

[26] Richard Shillcock,et al. Proceedings of EUROSPEECH-1991. , 1991 .

[27] Ted Pedersen,et al. Measures of semantic similarity and relatedness in the biomedical domain , 2007, J. Biomed. Informatics.

[28] Gosse Bouma. WordNet and Other Lexical Resources: Applications, Extensions and Customizations, Pittsburgh 3-4 June, 2001 , 2001 .

[29] F ChenStanley,et al. An Empirical Study of Smoothing Techniques for Language Modeling , 1996, ACL.