Conversational Speech Transcription Using Context-Dependent Deep Neural Networks
Context-Dependent Deep-Neural-Network HMMs, or CD-DNN-HMMs, combine the classic artificial-neural-network HMMs with traditional context-dependent acoustic modeling and deep-belief-network pre-training. CD-DNN-HMMs greatly outperform conventional CD-GMM (Gaussian mixture model) HMMs: The word error rate is reduced by up to one third on the difficult benchmarking task of speaker-independent single-pass transcription of telephone conversations.
Feature engineering in Context-Dependent Deep Neural Networks for conversational speech transcription
We investigate the potential of Context-Dependent Deep-Neural-Network HMMs, or CD-DNN-HMMs, from a feature-engineering perspective. Recently, we had shown that for speaker-independent transcription of phone calls (NIST RT03S Fisher data), CD-DNN-HMMs reduced the word error rate by as much as one third—from 27.4%, obtained by discriminatively trained Gaussian-mixture HMMs with HLDA features, to 18.5%—using 300+ hours of training data (Switchboard), 9000+ tied triphone states, and up to 9 hidden network layers.
neural network machine learning artificial neural network deep learning convolutional neural network convolutional neural natural language deep neural network speech recognition social media neural network model hidden markov model markov model deep neural medical image computer vision object detection image classification conceptual design generative adversarial network gaussian mixture model facial expression generative adversarial deep convolutional neural deep reinforcement learning network architecture adversarial network mutual information deep learning model speech recognition system deep convolutional cad system image denoising speech enhancement neural network architecture convolutional network facial expression recognition feedforward neural network expression recognition nash equilibrium domain adaptation single image loss function based on deep neural net deep learning method semi-supervised learning deep learning algorithm data augmentation neural networks based image super-resolution deep belief network deep network feature learning enhancement based image synthesi multilayer neural network unsupervised domain adaptation learning task latent space single image super-resolution conditional generative adversarial media service neural networks trained acoustic modeling theoretic analysi speech enhancement based conditional generative multi-layer neural network quantitative structure-activity relationship conversational speech information bottleneck generative adversarial net training deep neural noisy label training deep adversarial perturbation adversarial net generative network batch normalization convolutional generative adversarial social media service deep convolutional generative update rule adversarial neural network deep neural net sensing mri convolutional generative adversarial sample wasserstein gan machine-learning algorithm robust training ventral stream binary weight gan training train deep neural ventral visual pathway deep generative adversarial current speech recognition pre-trained deep neural analysi of tweets deep feedforward neural improving deep learning frechet inception distance training generative adversarial stimulus feature medical image synthesi training generative community intelligence acoustic input overcoming catastrophic forgetting social reporting networks reveal context-dependent deep neural deep compression ventral pathway weights and activation extremely noisy