论文信息 - Data augmentation and feature extraction using variational autoencoder for acoustic modeling

Data augmentation and feature extraction using variational autoencoder for acoustic modeling

A data augmentation and feature extraction method using a variational autoencoder (VAE) for acoustic modeling is described. A VAE is a generative model based on variational Bayesian learning using a deep learning framework. A VAE can extract latent values its input variables to generate new information. VAEs are widely used to generate pictures and sentences. In this paper, a VAE is applied to speech corpus data augmentation and feature vector extraction from speech for acoustic modeling. First, the size of a speech corpus is doubled by encoding latent variables extracted from original utterances using a VAE framework. The latent variables extracted from speech waveforms have latent "meanings" of the waveforms. Therefore, latent variables can be used as acoustic features for automatic speech recognition (ASR). This paper experimentally shows the effectiveness of data augmentation using a VAE framework and that latent variable-based features can be utilized in ASR.

Hiromitsu Nishizaki | H. Nishizaki

[1] Tara N. Sainath,et al. Auto-encoder bottleneck features using deep belief networks , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[2] DeLiang Wang,et al. Deep neural network based spectral feature mapping for robust speech recognition , 2015, INTERSPEECH.

[3] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.

[4] Samy Bengio,et al. Generating Sentences from a Continuous Space , 2015, CoNLL.

[5] Yoshua Bengio,et al. Generative Adversarial Nets , 2014, NIPS.

[6] Geoffrey E. Hinton,et al. Reducing the Dimensionality of Data with Neural Networks , 2006, Science.

[7] Max Welling,et al. Auto-Encoding Variational Bayes , 2013, ICLR.

[8] Navdeep Jaitly,et al. Vocal Tract Length Perturbation (VTLP) improves speech recognition , 2013 .

[9] Xiaodong Cui,et al. Data Augmentation for Deep Neural Network Acoustic Modeling , 2015, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[10] Kevin Duh,et al. Automation of system building for state-of-the-art large vocabulary speech recognition using evolution strategy , 2015, 2015 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU).

[11] Ruslan Salakhutdinov,et al. Importance Weighted Autoencoders , 2015, ICLR.