Unsupervised speech transcription and alignment based on two complementary ASR systems

An acoustic model is a necessary component of automatic speech recognition system. Acoustic models are trained on a lot of speech recordings with transcriptions. Usually, hundreds of transcribed recordings are required. It is very time and resource consuming process to create manual transcriptions. Acoustic models may be obtained automatically with unsupervised acoustic model training, which uses online speech resources. Obtained speech data are recognized with low resourced automatic speech recognition system. Unsupervised techniques are able to filter out the erroneous hypotheses from the result and the rest use for acoustic model training. Unsupervised methods for generating speech corpora for acoustic model training are presented in this paper.

[1]  Jean-Luc Gauvain,et al.  Unsupervised acoustic model training , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[2]  Péter Mihajlik,et al.  Lightly supervised acoustic model training for imprecisely and asynchronously transcribed speech , 2013, 2013 7th Conference on Speech Technology and Human - Computer Dialogue (SpeD).

[3]  Horia Cucu,et al.  Unsupervised acoustic model training using multiple seed ASR systems , 2014, SLTU.

[4]  Hermann Ney,et al.  An improved method for unsupervised training of LVCSR systems , 2007, INTERSPEECH.

[5]  Mark J. F. Gales,et al.  Lightly supervised recognition for automatic alignment of large coherent speech recordings , 2010, INTERSPEECH.

[6]  Richard M. Schwartz,et al.  Unsupervised acoustic and language model training with small amounts of labelled data , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[7]  Martin Lojka,et al.  Slovak Automatic Dictation System for Judicial Domain , 2011, LTC.

[8]  Richard M. Schwartz,et al.  Unsupervised versus supervised training of acoustic models , 2008, INTERSPEECH.

[9]  Marián Trnka,et al.  Advances in the Slovak Judicial Domain Dictation System , 2013, LTC.

[10]  Jean-Luc Gauvain,et al.  Lattice-based unsupervised acoustic model training , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).