Speech Processing for Digital Home Assistants
暂无分享,去创建一个
Heiga Zen | Shinji Watanabe | Tomohiro Nakatani | Björn Hoffmeister | Michiel Bacchiani | Mehrez Souden | Reinhold Haeb-Umbach | Michael L. Seltzer | H. Zen | M. Seltzer | M. Bacchiani | Shinji Watanabe | T. Nakatani | Björn Hoffmeister | R. Haeb-Umbach | M. Souden
[1] John R. Hershey,et al. Multichannel End-to-end Speech Recognition , 2017, ICML.
[2] Ariya Rastrow,et al. Contextual Language Model Adaptation for Conversational Agents , 2018, INTERSPEECH.
[3] Reinhold Häb-Umbach,et al. A generic neural acoustic beamforming architecture for robust multi-channel speech processing , 2017, Comput. Speech Lang..
[4] R. Maas,et al. A summary of the REVERB challenge: state-of-the-art and remaining challenges in reverberant speech processing research , 2016, EURASIP Journal on Advances in Signal Processing.
[5] Biing-Hwang Juang,et al. Blind speech dereverberation with multi-channel linear prediction based on short time fourier transform representation , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.
[6] Reinhold Häb-Umbach,et al. Blind speech separation employing directional statistics in an Expectation Maximization framework , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.
[7] Tara N. Sainath,et al. Generation of Large-Scale Simulated Utterances in Virtual Rooms to Train Deep-Neural Networks for Far-Field Speech Recognition in Google Home , 2017, INTERSPEECH.
[8] Jonathan Le Roux,et al. Improved MVDR Beamforming Using Single-Channel Mask Prediction Networks , 2016, INTERSPEECH.
[9] Tara N. Sainath,et al. State-of-the-Art Speech Recognition with Sequence-to-Sequence Models , 2017, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[10] Liang Lu,et al. Deep beamforming networks for multi-channel speech recognition , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[11] Arun Narayanan,et al. Adaptive Multichannel Dereverberation for Automatic Speech Recognition , 2017, INTERSPEECH.
[12] Jun Wang,et al. Deep Extractor Network for Target Speaker Recovery From Single Channel Speech Mixtures , 2018, INTERSPEECH.
[13] Jon Barker,et al. The fifth 'CHiME' Speech Separation and Recognition Challenge: Dataset, task and baselines , 2018, INTERSPEECH.
[14] Ariya Rastrow,et al. Accurate endpointing with expected pause duration , 2015, INTERSPEECH.
[15] Jont B. Allen,et al. Image method for efficiently simulating small‐room acoustics , 1976 .
[16] Sanjeev Khudanpur,et al. X-Vectors: Robust DNN Embeddings for Speaker Recognition , 2018, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[17] Tomohiro Nakatani,et al. Generalization of Multi-Channel Linear Prediction Methods for Blind MIMO Impulse Response Shortening , 2012, IEEE Transactions on Audio, Speech, and Language Processing.
[18] Georg Heigold,et al. Small-footprint keyword spotting using deep neural networks , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[19] Tomohiro Nakatani,et al. Making Machines Understand Us in Reverberant Rooms: Robustness Against Reverberation for Automatic Speech Recognition , 2012, IEEE Signal Processing Magazine.
[20] Hui Lin,et al. Recognition of multilingual speech in mobile applications , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[21] Roland Maas,et al. Combining Acoustic Embeddings and Decoding Features for End-of-Utterance Detection in Real-Time Far-Field Speech Recognition Systems , 2018, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[22] Richard M. Stern,et al. Likelihood-maximizing beamforming for robust hands-free speech recognition , 2004, IEEE Transactions on Speech and Audio Processing.
[23] Zhuo Chen,et al. Deep clustering: Discriminative embeddings for segmentation and separation , 2015, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[24] Reinhold Häb-Umbach,et al. Neural network based spectral mask estimation for acoustic beamforming , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[25] Jon Barker,et al. An analysis of environment, microphone and data simulation mismatches in robust speech recognition , 2017, Comput. Speech Lang..
[26] Erik McDermott,et al. Deep neural networks for small footprint text-dependent speaker verification , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[27] Emanuel A. P. Habets,et al. Online Speech Dereverberation Using Kalman Filter and EM Algorithm , 2015, IEEE/ACM Transactions on Audio, Speech, and Language Processing.
[28] Matt Shannon,et al. Improved End-of-Query Detection for Streaming Speech Recognition , 2017, INTERSPEECH.
[29] Yifan Gong,et al. End-to-End attention based text-dependent speaker verification , 2016, 2016 IEEE Spoken Language Technology Workshop (SLT).
[30] Scott Rickard,et al. Blind separation of speech mixtures via time-frequency masking , 2004, IEEE Transactions on Signal Processing.
[31] Rémi Gribonval,et al. Under-Determined Reverberant Audio Source Separation Using a Full-Rank Spatial Covariance Model , 2009, IEEE Transactions on Audio, Speech, and Language Processing.
[32] Tomohiro Nakatani,et al. Adaptive dereverberation of speech signals with speaker-position change detection , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.
[33] Tara N. Sainath,et al. Convolutional neural networks for small-footprint keyword spotting , 2015, INTERSPEECH.
[34] Arindam Mandal,et al. Monophone-Based Background Modeling for Two-Stage On-Device Wake Word Detection , 2018, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[35] Tomohiro Nakatani,et al. Integrating Neural Network Based Beamforming and Weighted Prediction Error Dereverberation , 2018, INTERSPEECH.
[36] Peter Vary,et al. Multichannel audio database in various acoustic environments , 2014, 2014 14th International Workshop on Acoustic Signal Enhancement (IWAENC).
[37] Jesper Jensen,et al. Permutation invariant training of deep models for speaker-independent multi-talker speech separation , 2016, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[38] Jürgen Schmidhuber,et al. An Application of Recurrent Neural Networks to Discriminative Keyword Spotting , 2007, ICANN.
[39] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[40] Tara N. Sainath,et al. Performance of Mask Based Statistical Beamforming in a Smart Home Scenario , 2018, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[41] Tara N. Sainath,et al. Multichannel Signal Processing With Deep Neural Networks for Automatic Speech Recognition , 2017, IEEE/ACM Transactions on Audio, Speech, and Language Processing.
[42] Mary Harper. The Automatic Speech recogition In Reverberant Environments (ASpIRE) challenge , 2015, 2015 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU).
[43] Heiga Zen,et al. WaveNet: A Generative Model for Raw Audio , 2016, SSW.
[44] Nikko Strom,et al. Direct modeling of raw audio with DNNS for wake word detection , 2017, 2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU).
[45] Brian Roark,et al. Bringing contextual information to google speech recognition , 2015, INTERSPEECH.
[46] Sree Hari Krishnan Parthasarathi,et al. Anchored Speech Detection , 2016, INTERSPEECH.
[47] Heiga Zen,et al. Parallel WaveNet: Fast High-Fidelity Speech Synthesis , 2017, ICML.
[48] Zhizheng Wu,et al. Siri On-Device Deep Learning-Guided Unit Selection Text-to-Speech System , 2017, INTERSPEECH.
[49] Reinhold Häb-Umbach,et al. Beamnet: End-to-end training of a beamformer-supported multi-channel ASR system , 2017, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[50] Lucas C. Parra,et al. A SURVEY OF CONVOLUTIVE BLIND SOURCE SEPARATION METHODS , 2007 .
[51] Chunlei Zhang,et al. End-to-End Text-Independent Speaker Verification with Triplet Loss on Short Utterances , 2017, INTERSPEECH.
[52] Tomohiro Nakatani,et al. Complex angular central Gaussian mixture model for directional statistics in mask-based microphone array signal processing , 2016, 2016 24th European Signal Processing Conference (EUSIPCO).
[53] Sri Harish Reddy Mallidi,et al. Device-directed Utterance Detection , 2018, INTERSPEECH.
[54] Mitch Weintraub,et al. Acoustic Modeling for Google Home , 2017, INTERSPEECH.
[55] Takuya Yoshioka,et al. Exploring Practical Aspects of Neural Mask-Based Beamforming for Far-Field Speech Recognition , 2018, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[56] Yuzhou Liu,et al. Neural Network Based Time-Frequency Masking and Steering Vector Estimation for Two-Channel Mvdr Beamforming , 2018, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).