Multilingual query by example spoken term detection for under-resourced languages

We propose a query-by-example approach to multilingual Spoken Term Detection for under-resourced languages based on Automatic Speech Recognition. The approach overcomes the main difficulties met under these conditions, i.e., providing a new method for building multilingual acoustic models with few annotated data and searching in approximate Automatic Speech Recognition transcriptions providing high scalability. The acoustic models are obtained by adapting well-trained phonemes to the ones from the envisaged languages. The mapping is made according to International Phonetic Alphabet phoneme classification and a confusion matrix. The weighting of query length and alignment spread are incorporated in the Dynamic Time Warping technique to improve the searching method. Experimental validation was conducted on a standard data set consisting of 3 hours of mixed African languages. The recorded speech has telephonic quality and it is a mix of read and spontaneous speech.

[1]  Hui Lin,et al.  A study on multilingual acoustic modeling for large vocabulary ASR , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[2]  Hao Tang,et al.  Spoken term detection from bilingual spontaneous speech using code-switched lattice-based structures for words and subword units , 2009, 2009 IEEE Workshop on Automatic Speech Recognition & Understanding.

[3]  William J. Byrne,et al.  Towards language independent acoustic modeling , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[4]  Xavier Anguera Telefonica Research System for the Spoken Web Search task at Mediaeval 2012 , 2012, ICASSP 2013.

[5]  Laurent Besacier,et al.  First steps in fast acoustic modeling for a new target language: application to Vietnamese , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[6]  Simon King,et al.  Stochastic pronunciation modelling and soft match for out-of-vocabulary spoken term detection , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[7]  James R. Glass,et al.  Fast spoken query detection using lower-bound Dynamic Time Warping on Graphical Processing Units , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[8]  Siddika Parlak,et al.  Spoken term detection for Turkish Broadcast News , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[9]  Horia Cucu,et al.  Investigating the role of machine translated text in ASR domain adaptation: Unsupervised and semi-supervised methods , 2011, 2011 IEEE Workshop on Automatic Speech Recognition & Understanding.

[10]  Jozef Vavrek,et al.  TUKE MediaEval 2012: Spoken Web Search using DTW and Unsupervised SVM , 2012, MediaEval.

[11]  Timothy J. Hazen,et al.  A comparison of query-by-example methods for spoken term detection , 2009, INTERSPEECH.

[12]  Bhuvana Ramabhadran,et al.  Balancing false alarms and hits in Spoken Term Detection , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[13]  Fabio Valente,et al.  Application of out-of-language detection to spoken term detection , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[14]  Karel Veselý,et al.  BUT2012 Approaches for Spoken Web Search - MediaEval 2012 , 2012, MediaEval.

[15]  Florian Metze,et al.  The Spoken Web Search Task , 2012, MediaEval.

[16]  Timothy J. Hazen,et al.  Query-by-example spoken term detection using phonetic posteriorgram templates , 2009, 2009 IEEE Workshop on Automatic Speech Recognition & Understanding.

[17]  Bhuvana Ramabhadran,et al.  Query-by-example Spoken Term Detection For OOV terms , 2009, 2009 IEEE Workshop on Automatic Speech Recognition & Understanding.

[18]  Hervé Bourlard,et al.  Using KL-divergence and multilingual information to improve ASR for under-resourced languages , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[19]  Björn W. Schuller,et al.  The TUM Cumulative DTW Approach for the Mediaeval 2012 Spoken Web Search Task , 2012 .

[20]  Aren Jansen,et al.  The JHU-HLTCOE Spoken Web Search System for MediaEval 2012 , 2012, MediaEval.

[21]  Mireia Díez,et al.  GTTS System for the Spoken Web Search Task at MediaEval 2012 , 2012, MediaEval.