Unicode-based graphemic systems for limited resource languages

Large vocabulary continuous speech recognition systems require a mapping from words, or tokens, into sub-word units to enable robust estimation of acoustic model parameters, and to model words not seen in the training data. The standard approach to achieve this is to manually generate a lexicon where words are mapped into phones, often with attributes associated with each of these phones. Contextdependent acoustic models are then constructed using decision trees where questions are asked based on the phones and phone attributes. For low-resource languages, it may not be practical to manually generate a lexicon. An alternative approach is to use a graphemic lexicon, where the “pronunciation” for a word is defined by the letters forming that word. This paper proposes a simple approach for building graphemic systems for any language written in unicode. The attributes for graphemes are automatically derived using features from the unicode character descriptions. These attributes are then used in decision tree construction. This approach is examined on the IARPA Babel Option Period 2 languages, and a Levantine Arabic CTS task. The described approach achieves comparable, and complementary, performance to phonetic lexicon-based approaches.

[1]  Jean-Luc Gauvain,et al.  Developing STT and KWS systems using limited language resources , 2014, INTERSPEECH.

[2]  Satoshi Nakamura,et al.  Recent progress in developing grapheme-based speech recognition for Indonesian ethnic languages: Javanese, Sundanese, Balinese and Bataks , 2014, SLTU.

[3]  Mark J. F. Gales,et al.  Morphological decomposition in Arabic ASR systems , 2012, Comput. Speech Lang..

[4]  Andreas Stolcke,et al.  Development of a conversational telephone speech recognizer for Levantine Arabic , 2005, INTERSPEECH.

[5]  Mark J. F. Gales,et al.  Language independent and unsupervised acoustic models for speech recognition and keyword spotting , 2014, INTERSPEECH.

[6]  Etienne Barnard,et al.  Speech data collection in an under-resourced language within a multilingual context , 2014, SLTU.

[7]  Jordi Luque,et al.  Audio-to-text alignment for speech recognition with very limited resources , 2014, INTERSPEECH.

[8]  Mark J. F. Gales,et al.  Model-Based Approaches for Degraded Channel Modelling in Robust ASR , 2012, INTERSPEECH.

[9]  Tanja Schultz,et al.  Grapheme based speech recognition , 2003, INTERSPEECH.

[10]  Tanja Schultz,et al.  A Grapheme Based Speech Recognition System for Russian , 2004 .

[11]  Daniel Povey,et al.  The Kaldi Speech Recognition Toolkit , 2011 .

[12]  Laurent Besacier,et al.  Comparison of acoustic modeling techniques for Vietnamese and Khmer ASR , 2006, INTERSPEECH.

[13]  Mark J. F. Gales,et al.  Word Boundary Modelling and Full Covariance Gaussians for Arabic Speech-to-Text Systems , 2011, INTERSPEECH.

[14]  Marelie H. Davel,et al.  Comparing grapheme-based and phoneme-based speech recognition for Afrikaans , 2012 .

[15]  Ramya Rasipuram,et al.  Grapheme and multilingual posterior features for under-resourced speech recognition: A study on Scottish Gaelic , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[16]  Other Contributors Are Indicated Where They Contribute The Unicode Consortium , 2017 .

[17]  Hermann Ney,et al.  Context-dependent acoustic modeling using graphemes for large vocabulary speech recognition , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.