RSC: A Romanian Read Speech Corpus for Automatic Speech Recognition

Although many efforts have been made in the last decade to enhance the speech and language resources for Romanian, this language is still considered under-resourced. While for many other languages there are large speech corpora available for research and commercial applications, for Romanian language the largest publicly available corpus to date comprises less than 50 hours of speech. In this context, Speech and Dialogue research group releases Read Speech Corpus (RSC) – a Romanian speech corpus developed in-house, comprising 100 hours of speech recordings from 164 different speakers. The paper describes the development of the corpus and presents baseline automatic speech recognition (ASR) results using state-of-the-art ASR technology: Kaldi speech recognition toolkit.

[1]  Lukás Burget,et al.  Recurrent neural network based language model , 2010, INTERSPEECH.

[2]  Horia Cucu,et al.  Recent improvements of the SpeeD Romanian LVCSR system , 2014, 2014 10th International Conference on Communications (COMM).

[3]  Laurent Besacier,et al.  MaSS: A Large and Clean Multilingual Corpus of Sentence-aligned Spoken Utterances Extracted from the Bible , 2019, LREC.

[4]  Mark J. F. Gales,et al.  Maximum likelihood linear transformations for HMM-based speech recognition , 1998, Comput. Speech Lang..

[5]  DIANA BIBIRI,et al.  ROMANIAN CORPUS FOR SPEECH-TO-TEXT ALIGNMENT ANCA – , 2013 .

[6]  Verginica Barbu Mititelu,et al.  CoRoLa ― The Reference Corpus of Contemporary Romanian Language , 2014, LREC.

[7]  Simon King,et al.  The Romanian speech synthesis (RSS) corpus: Building a high quality HMM-based speech synthesis system using a high sampling rate , 2011, Speech Commun..

[8]  Bogdan Orza,et al.  The SWARA speech corpus: A large parallel Romanian read speech dataset , 2017, 2017 International Conference on Speech Technology and Human-Computer Dialogue (SpeD).

[9]  P. Mihajlik,et al.  Broadcast news transcription in Central-East European languages , 2012, 2012 IEEE 3rd International Conference on Cognitive Infocommunications (CogInfoCom).

[10]  Horia Cucu,et al.  SpeeD's DNN approach to Romanian speech recognition , 2017, 2017 International Conference on Speech Technology and Human-Computer Dialogue (SpeD).

[11]  George Suciu,et al.  Towards a continuous speech corpus for banking domain automatic speech recognition , 2017, 2017 International Conference on Speech Technology and Human-Computer Dialogue (SpeD).

[12]  Sanjeev Khudanpur,et al.  A time delay neural network architecture for efficient modeling of long temporal contexts , 2015, INTERSPEECH.

[13]  Cosmin Munteanu,et al.  Design, Collection, and Annotation of a Romanian Speech Database , 1998 .

[14]  Mircea Giurgiu,et al.  A Romanian corpus for speech perception and automatic speech recognition , 2011 .

[15]  Patrick Kenny,et al.  Front-End Factor Analysis for Speaker Verification , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[16]  Horia Cucu,et al.  Automatic Annotation of Speech Corpora Using Complementary GMM and DNN Acoustic Models , 2018, 2018 41st International Conference on Telecommunications and Signal Processing (TSP).

[17]  Yiming Wang,et al.  Purely Sequence-Trained Neural Networks for ASR Based on Lattice-Free MMI , 2016, INTERSPEECH.

[18]  Hagen Soltau,et al.  Fast speaker adaptive training for speech recognition , 2008, INTERSPEECH.

[19]  Yiming Wang,et al.  Semi-Orthogonal Low-Rank Matrix Factorization for Deep Neural Networks , 2018, INTERSPEECH.

[20]  Horia Cucu,et al.  Progress on automatic annotation of speech corpora using complementary ASR systems , 2019, 2019 42nd International Conference on Telecommunications and Signal Processing (TSP).