Audio indexing on a medical video database: The AVISON project

This paper presents an overview of our research conducted in the context of the AVISON project which aims to develop a platform for indexing surgery videos of the Institute of Research Against Digestive Cancer. The platform is intended to provide a friendly query-based access to the videos database of IRCAD institute, that is dedicated to the training of international surgeons. A text-based indexing system is used for querying the videos where the textual contents are obtained with an automatic speech recognition system. The paper presents the new approaches that we proposed for dealing with these highly specialised data in an automatic manner. We present new approaches for obtaining low-cost training corpus, for automatically adapting the automatic speech recognition system, for allowing multilingual querying of videos and, finally, for filtering documents that could affect the database quality due to transcription errors.

[1]  Jean-Michel Renders,et al.  A Geometric View on Bilingual Lexicon Extraction from Comparable Corpora , 2004, ACL.

[2]  Pascale Fung,et al.  Compiling Bilingual Lexicon Entries From a Non-Parallel English-Chinese Corpus , 1995, VLC@ACL.

[3]  Stephen Cox,et al.  High-level approaches to confidence estimation in speech recognition , 2002, IEEE Trans. Speech Audio Process..

[4]  Thierry Bazillon,et al.  Manual vs Assisted Transcription of Prepared and Spontaneous Speech , 2008, LREC.

[5]  Raphaël Rubino Exploring Context Variation and Lexicon Coverage in Projection-based Approach for Term Translation , 2009, RANLP.

[6]  Georges Linarès,et al.  A segment-level confidence measure for Spoken Document Retrieval , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[7]  Georges Linarès,et al.  Transcriber Driving Strategies for Transcription Aid System , 2010, LREC.

[8]  Georges Linarès,et al.  Using the World Wide Web for Learning New Words in Continuous Speech Recognition Tasks: Two Case Studies , 2009 .

[9]  Philippe Langlais,et al.  Revisiting Context-based Projection Methods for Term-Translation Spotting in Comparable Corpora , 2010, COLING.

[10]  Georges Linarès,et al.  Probabilistic and possibilistic language models based on the world wide web , 2009, INTERSPEECH.

[11]  Marcello Federico,et al.  Broadcast news LM adaptation over time , 2004, Comput. Speech Lang..

[12]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[13]  Reinhard Rapp,et al.  Identifying Word Translations in Non-Parallel Texts , 1995, ACL.

[14]  Philipp Koehn,et al.  Learning a Translation Lexicon from Monolingual Corpora , 2002, ACL 2002.

[15]  Georges Linarès,et al.  A Multi-view Approach for Term Translation Spotting , 2011, CICLing.

[16]  Didier Dubois,et al.  Possibility theory and statistical reasoning , 2006, Comput. Stat. Data Anal..

[17]  Georges Linarès,et al.  The LIA Speech Recognition System: From 10xRT to 1xRT , 2007, TSD.

[18]  Vladimir I. Levenshtein,et al.  Binary codes capable of correcting deletions, insertions, and reversals , 1965 .

[19]  Georges Linarès,et al.  Combination of probabilistic and possibilistic language models , 2010, INTERSPEECH.

[20]  Reinhard Rapp,et al.  Automatic Identification of Word Translations from Unrelated English and German Corpora , 1999, ACL.

[21]  Pascale Fung,et al.  Finding Terminology Translations from Non-parallel Corpora , 1997, VLC.