Advances in Speech and Language Technologies for Iberian Languages

This paper presents the results of the analysis of a set prosodic parameters considered relevant for the expression of emotion in Spanish on a corpus of read aloud chat messages, and explores the application of the obtained results to generate emotional synthetic speech using a novel parametric approach. The obtained results show that the analysed parameters seem to be relevant for the differentiation among the considered emotions, but that its use in parametric synthesis does not offer yet the desired quality level, although better in any case than using corpus-based techniques.

[1]  Patrick Kenny,et al.  Front-End Factor Analysis for Speaker Verification , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[2]  Yee Whye Teh,et al.  A Fast Learning Algorithm for Deep Belief Nets , 2006, Neural Computation.

[3]  R. Smith,et al.  An Overview of the Tesseract OCR Engine , 2007, Ninth International Conference on Document Analysis and Recognition (ICDAR 2007).

[4]  Sébastien Marcel,et al.  Hierarchical speaker clustering methods for the NIST i-vector Challenge , 2014, Odyssey.

[5]  Mohammad Mehdi Homayounpour,et al.  Linearly Constrained Minimum Variance for Robust I-vector Based Speaker Recognition , 2014, Odyssey.

[6]  Jon Sánchez,et al.  Using an ASR database to design a pronunciation evaluation system in Basque , 2012, LREC.

[7]  Javier Hernando,et al.  i-Vector Modeling with Deep Belief Networks for Multi-Session Speaker Recognition , 2014, Odyssey.

[8]  Geoffrey E. Hinton,et al.  Reducing the Dimensionality of Data with Neural Networks , 2006, Science.

[9]  F. Itakura,et al.  Balancing acoustic and linguistic probabilities , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[10]  Narada D. Warakagoda,et al.  The COST 249 SpeechDat Multilingual Reference Recogniser , 2000, LREC.

[11]  Javier Hernando,et al.  Deep belief networks for i-vector based speaker recognition , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[12]  Sergey Novoselov,et al.  STC Speaker Recognition System for the NIST i-Vector Challenge , 2014, Odyssey.

[13]  John H. L. Hansen,et al.  A fast speaker verification with universal background support data selection , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[14]  Bin Ma,et al.  Phonetically-constrained PLDA modeling for text-dependent speaker verification with multiple short utterances , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[15]  Haizhou Li,et al.  ALIZE 3.0 - open source toolkit for state-of-the-art speaker recognition , 2013, INTERSPEECH.

[16]  Niko Brümmer,et al.  The speaker partitioning problem , 2010, Odyssey.

[17]  M. Inés Torres,et al.  Basque Speecon-like and Basque SpeechDat MDB-600: speech databases for the development of ASR technology for Basque , 2014, LREC.

[18]  Simon Dobrisek,et al.  Incorporating Duration Information into I-Vector-Based Speaker Recognition Systems , 2014, Odyssey.

[19]  Geoffrey E. Hinton A Practical Guide to Training Restricted Boltzmann Machines , 2012, Neural Networks: Tricks of the Trade.

[20]  James H. Elder,et al.  Probabilistic Linear Discriminant Analysis for Inferences About Identity , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[21]  Patrick Kenny,et al.  Bayesian Speaker Verification with Heavy-Tailed Priors , 2010, Odyssey.

[22]  John H. L. Hansen,et al.  An investigation on back-end for speaker recognition in multi-session enrollment , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[23]  Sridha Sridharan,et al.  Improved SVM speaker verification through data-driven background dataset collection , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[24]  Andreas Stolcke,et al.  SRILM - an extensible language modeling toolkit , 2002, INTERSPEECH.