Speech Research at Google to Enable Universal Speech Interfaces
暂无分享,去创建一个
Heiga Zen | Johan Schalkwyk | Michiel Bacchiani | Alexander Gruenstein | Pedro J. Moreno | Trevor Strohman | Françoise Beaufays | H. Zen | P. Moreno | M. Bacchiani | F. Beaufays | Alexander Gruenstein | Trevor Strohman | J. Schalkwyk | A. Gruenstein
[1] Tara N. Sainath,et al. Speaker location and microphone spacing invariant acoustic modeling from raw multichannel waveforms , 2015, 2015 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU).
[2] Georg Heigold,et al. End-to-end text-dependent speaker verification , 2015, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[3] Arnaud Sahuguet,et al. An audio indexing system for election video material , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.
[4] Ron J. Weiss,et al. Speech acoustic modeling from raw multichannel waveforms , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[5] Georg Heigold,et al. GMM-free DNN acoustic model training , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[6] Hank Liao,et al. Large scale deep neural network acoustic modeling with semi-supervised training data for YouTube video transcription , 2013, 2013 IEEE Workshop on Automatic Speech Recognition and Understanding.
[7] Ian McGraw,et al. On the compression of recurrent neural networks with an application to LVCSR acoustic modeling for embedded speech recognition , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[8] Jiulong Shan,et al. Search by voice in Mandarin Chinese , 2010, INTERSPEECH.
[9] Andrew W. Senior,et al. Fast and accurate recurrent neural network acoustic models for speech recognition , 2015, INTERSPEECH.
[10] Cyril Allauzen,et al. Bayesian Language Model Interpolation for Mobile Speech Input , 2011, INTERSPEECH.
[11] Ian McGraw,et al. Personalized speech recognition on mobile devices , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[12] Johan Schalkwyk,et al. Learning acoustic frame labeling for speech recognition with recurrent neural networks , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[13] Francoise Beaufays,et al. Google Search by Voice: A Case Study , 2010 .
[14] Tara N. Sainath,et al. FUNDAMENTAL TECHNOLOGIES IN MODERN SPEECH RECOGNITION Digital Object Identifier 10.1109/MSP.2012.2205597 , 2012 .
[15] Erik McDermott,et al. Deep neural networks for small footprint text-dependent speaker verification , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[16] Thad Hughes,et al. Building transcribed speech corpora quickly and cheaply for many languages , 2010, INTERSPEECH.
[17] Richard M. Stern,et al. Likelihood-maximizing beamforming for robust hands-free speech recognition , 2004, IEEE Transactions on Speech and Audio Processing.
[18] Heiga Zen,et al. Deep mixture density networks for acoustic modeling in statistical parametric speech synthesis , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[19] Michiel Bacchiani,et al. Context dependent state tying for speech recognition using deep neural network acoustic models , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[20] Maria Shugrina,et al. Formatting Time-Aligned ASR Transcripts for Readability , 2010, NAACL.
[21] Yannis Agiomyrgiannakis,et al. Vocaine the vocoder and applications in speech synthesis , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[22] Hervé Bourlard,et al. Connectionist Speech Recognition: A Hybrid Approach , 1993 .
[23] Francoise Beaufays,et al. Language Modeling Capitalization , 2013 .
[24] Fuchun Peng,et al. Grapheme-to-phoneme conversion using Long Short-Term Memory recurrent neural networks , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[25] Tara N. Sainath,et al. Convolutional, Long Short-Term Memory, fully connected Deep Neural Networks , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[26] Tara N. Sainath,et al. Convolutional neural networks for small-footprint keyword spotting , 2015, INTERSPEECH.
[27] Alexander Gruenstein,et al. Accurate and compact large vocabulary speech recognition on mobile devices , 2013, INTERSPEECH.
[28] Johan Schalkwyk,et al. OpenFst: A General and Efficient Weighted Finite-State Transducer Library , 2007, CIAA.
[29] Steve Renals,et al. THE USE OF RECURRENT NEURAL NETWORKS IN CONTINUOUS SPEECH RECOGNITION , 1996 .
[30] Marc'Aurelio Ranzato,et al. Large Scale Distributed Deep Networks , 2012, NIPS.
[31] Heiga Zen,et al. Directly modeling voiced and unvoiced components in speech waveforms by neural networks , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[32] Tara N. Sainath,et al. Learning the speech front-end with raw waveform CLDNNs , 2015, INTERSPEECH.
[33] Heiga Zen,et al. Using instantaneous frequency and aperiodicity detection to estimate F0 for high-quality speech synthesis , 2016, SSW.
[34] Georg Heigold,et al. Asynchronous stochastic optimization for sequence training of deep neural networks , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[35] Heiga Zen,et al. Fast, Compact, and High Quality LSTM-RNN Based Statistical Parametric Speech Synthesizers for Mobile Devices , 2016, INTERSPEECH.
[36] Tara N. Sainath,et al. Neural Network Adaptive Beamforming for Robust Multichannel Speech Recognition , 2016, INTERSPEECH.
[37] Joaquín González-Rodríguez,et al. Frame-by-frame language identification in short utterances using deep neural networks , 2015, Neural Networks.
[38] Stephen J. Wright,et al. Hogwild: A Lock-Free Approach to Parallelizing Stochastic Gradient Descent , 2011, NIPS.
[39] Rohit Prabhavalkar,et al. Compressing deep neural networks using a rank-constrained topology , 2015, INTERSPEECH.
[40] Zheng-Hua Tan,et al. Speech Recognition on Mobile Devices , 2008, WMMP.
[41] Johan Schalkwyk,et al. Deploying GOOG-411: Early lessons in data, measurement, and testing , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.
[42] Sanjay Ghemawat,et al. MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.
[43] Fuchun Peng,et al. Pronunciation learning for named-entities through crowd-sourcing , 2014, INTERSPEECH.
[44] Heiga Zen,et al. Directly modeling speech waveforms by neural networks for statistical parametric speech synthesis , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[45] Michiel Bacchiani,et al. Discriminative Features for Language Identification , 2011, INTERSPEECH.
[46] Heiga Zen,et al. Unidirectional long short-term memory recurrent neural network with recurrent output layer for low-latency speech synthesis , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[47] Jonathan Le Roux,et al. Deep Unfolding: Model-Based Inspiration of Novel Deep Architectures , 2014, ArXiv.
[48] Alexander Gutkin,et al. Recent Advances in Google Real-Time HMM-Driven Unit Selection Synthesizer , 2016, INTERSPEECH.
[49] Tara N. Sainath,et al. Locally-connected and convolutional neural networks for small footprint speaker recognition , 2015, INTERSPEECH.
[50] Tara N. Sainath,et al. Deep Neural Networks for Acoustic Modeling in Speech Recognition , 2012 .
[51] Tara N. Sainath,et al. Complex Linear Projection (CLP): A Discriminative Approach to Joint Feature Extraction and Acoustic Modeling , 2016, INTERSPEECH.
[52] Tara N. Sainath,et al. Query-by-example keyword spotting using long short-term memory networks , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[53] Geoffrey E. Hinton,et al. Phoneme recognition using time-delay neural networks , 1989, IEEE Trans. Acoust. Speech Signal Process..
[54] Michiel Bacchiani,et al. Restoring punctuation and capitalization in transcribed speech , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.
[55] Georg Heigold,et al. Small-footprint keyword spotting using deep neural networks , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[56] Cyril Allauzen,et al. Improved recognition of contact names in voice commands , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[57] Izhak Shafran,et al. Context dependent phone models for LSTM RNN acoustic modelling , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[58] Heiga Zen,et al. Statistical parametric speech synthesis using deep neural networks , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.
[59] Heiga Zen. Acoustic Modeling for Speech Synthesis: from HMM to RNN , 2015 .
[60] Tara N. Sainath,et al. Factored spatial and spectral multichannel raw waveform CLDNNs , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[61] Tara N. Sainath,et al. Automatic gain control and multi-style training for robust small-footprint keyword spotting with deep neural networks , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[62] Andrew W. Senior,et al. Long short-term memory recurrent neural network architectures for large scale acoustic modeling , 2014, INTERSPEECH.
[63] Richard Sproat,et al. The Kestrel TTS text normalization system , 2014, Natural Language Engineering.
[64] Brian Roark,et al. Bringing contextual information to google speech recognition , 2015, INTERSPEECH.
[65] Cyril Allauzen,et al. Written-domain language modeling for automatic speech recognition , 2013, INTERSPEECH.
[66] Alex Graves,et al. Supervised Sequence Labelling with Recurrent Neural Networks , 2012, Studies in Computational Intelligence.
[67] Georg Heigold,et al. Sequence discriminative distributed training of long short-term memory recurrent neural networks , 2014, INTERSPEECH.
[68] Cyril Allauzen,et al. Unary Data Structures for Language Models , 2011, INTERSPEECH.
[69] Rohit Prabhavalkar,et al. On the Efficient Representation and Execution of Deep Acoustic Models , 2016, INTERSPEECH.