Low Resource Acoustic-to-articulatory Inversion Using Bi-directional Long Short Term Memory

Estimating articulatory movements from speech acoustic features is known as acoustic-to-articulatory inversion (AAI). Large amount of parallel data from speech and articulatory motion is required for training an AAI model in a subject dependent manner, referred to as subject dependent AAI (SD-AAI). Electromagnetic articulograph (EMA) is a promising technology to record such parallel data, but it is expensive, time consuming and tiring for a subject. In order to reduce the demand for parallel acoustic-articulatory data in the AAI task for a subject, we, in this work, propose a subject-adaptative AAI method (SA-AAI) from an existing AAI model which is trained using large amount of parallel data from a fixed set of subjects. Experiments are performed with 30 subjects’ acoustic-articulatory data and AAI is trained using BLSTM network to examine the amount of data needed from a new target subject for the SAAAI to achieve an AAI performance equivalent to that of SDAAI. Experimental results reveal that the proposed SA-AAI performs similar to that of the SD-AAI with∼62.5% less training data. Among different articulators, the SA-AAI performance for tongue articulators matches with the corresponding SD-AAI performance with only ∼12.5% of the data used for SD-AAI training.

[1]  Mark K. Tiede,et al.  Vocal Tract Length Normalization for Speaker Independent Acoustic-to-Articulatory Speech Inversion , 2016, INTERSPEECH.

[2]  Korin Richmond,et al.  Estimating articulatory parameters from the acoustic speech signal , 2002 .

[3]  Chiranjeevi Yarra,et al.  Comparison of speech quality with and without sensors in electromagnetic articulograph AG 501 recording , 2014, INTERSPEECH.

[4]  Gérard Bailly,et al.  Speaker adaptation of an acoustic-articulatory inversion model using cascaded Gaussian mixture regressions , 2013, INTERSPEECH.

[5]  Katrin Kirchhoff,et al.  Robust speech recognition using articulatory information , 1998 .

[6]  B. Atal,et al.  Inversion of articulatory-to-acoustic transformation in the vocal tract by a computer-sorting technique. , 1978, The Journal of the Acoustical Society of America.

[7]  Lei Xie,et al.  Head motion synthesis from speech using deep neural networks , 2015, Multimedia Tools and Applications.

[8]  Prasanta Kumar Ghosh,et al.  Improved subject-independent acoustic-to-articulatory inversion , 2015, Speech Commun..

[9]  Steve Renals,et al.  A Deep Neural Network for Acoustic-Articulatory Speech Inversion , 2011 .

[10]  Simon King,et al.  An automatic speech recognition system using neural networks and linear dynamic models to recover and model articulatory traces , 2000, INTERSPEECH.

[11]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[12]  Jürgen Schmidhuber,et al.  LSTM: A Search Space Odyssey , 2015, IEEE Transactions on Neural Networks and Learning Systems.

[13]  Daniel Povey,et al.  Universal background model based speech recognition , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[14]  Kai Zhao,et al.  Acoustic to articulatory mapping with deep neural network , 2014, Multimedia Tools and Applications.

[15]  Korin Richmond A multitask learning perspective on acoustic-articulatory inversion , 2007, INTERSPEECH.

[16]  Korin Richmond,et al.  A trajectory mixture density network for the acoustic-articulatory inversion mapping , 2006, INTERSPEECH.

[17]  Shrikanth S. Narayanan,et al.  Speaker verification based on the fusion of speech acoustics and inverted articulatory signals , 2016, Comput. Speech Lang..

[18]  Peng Liu,et al.  A deep recurrent approach for acoustic-to-articulatory inversion , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[19]  Yongxin Wang,et al.  Emotional Audio-Visual Speech Synthesis Based on PAD , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[20]  Lianhong Cai,et al.  Head and facial gestures synthesis using PAD model for an expressive talking avatar , 2014, Multimedia Tools and Applications.

[21]  Ren-Hua Wang,et al.  Integrating Articulatory Features Into HMM-Based Parametric Speech Synthesis , 2009, IEEE Transactions on Audio, Speech, and Language Processing.

[22]  Le Zhang,et al.  Acoustic-Articulatory Modeling With the Trajectory HMM , 2008, IEEE Signal Processing Letters.

[23]  Heekuck Oh,et al.  Neural Networks for Pattern Recognition , 1993, Adv. Comput..

[24]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[25]  Laurent Girin,et al.  Speaker-Adaptive Acoustic-Articulatory Inversion Using Cascaded Gaussian Mixture Regression , 2015, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[26]  An Ji,et al.  Parallel Reference Speaker Weighting for Kinematic-Independent Acoustic-to-Articulatory Inversion , 2016, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[27]  Prasanta Kumar Ghosh,et al.  Optimal sensor placement in electromagnetic articulography recording for speech production study , 2018, Comput. Speech Lang..

[28]  Keiichi Tokuda,et al.  Statistical mapping between articulatory movements and acoustic spectrum using a Gaussian mixture model , 2008, Speech Commun..

[29]  Steve Young,et al.  The HTK hidden Markov model toolkit: design and philosophy , 1993 .

[30]  Prasanta Kumar Ghosh,et al.  A comparative study of acoustic-to-articulatory inversion for neutral and whispered speech , 2017, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[31]  Douglas A. Reynolds,et al.  Speaker Verification Using Adapted Gaussian Mixture Models , 2000, Digit. Signal Process..

[32]  Shrikanth Narayanan,et al.  A generalized smoothness criterion for acoustic-to-articulatory inversion. , 2010, The Journal of the Acoustical Society of America.

[33]  Yves Laprie,et al.  Modeling the articulatory space using a hypercube codebook for acoustic-to-articulatory inversion. , 2005, The Journal of the Acoustical Society of America.

[34]  Qiang Yang,et al.  A Survey on Transfer Learning , 2010, IEEE Transactions on Knowledge and Data Engineering.