Deep bottleneck features and sound-dependent i-vectors for simultaneous recognition of speech and environmental sounds
暂无分享,去创建一个
Satoshi Nakamura | Graham Neubig | Sakriani Sakti | Koichiro Yoshino | Seiji Kawanishi | Graham Neubig | S. Sakti | Satoshi Nakamura | Koichiro Yoshino | S. Kawanishi
[1] Richard F. Lyon,et al. Machine Hearing: An Emerging Field [Exploratory DSP] , 2010, IEEE Signal Processing Magazine.
[2] Yajie Miao,et al. Kaldi+PDNN: Building DNN-based ASR Systems with Kaldi and PDNN , 2014, ArXiv.
[3] Yifan Gong,et al. Cross-language knowledge transfer using multilingual deep neural network with shared hidden layers , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.
[4] Yu Zhang,et al. Speaker adaptation using the i-vector technique for bottleneck features , 2015, INTERSPEECH.
[5] Onur Dikmen,et al. Sound event detection using non-negative dictionaries learned from annotated overlapping events , 2013, 2013 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics.
[6] Patrick Kenny,et al. Joint Factor Analysis of Speaker and Session Variability: Theory and Algorithms , 2006 .
[7] Mark J. F. Gales,et al. Maximum likelihood linear transformations for HMM-based speech recognition , 1998, Comput. Speech Lang..
[8] E. C. Cmm,et al. on the Recognition of Speech, with , 2008 .
[9] Mark J. F. Gales,et al. Semi-tied covariance matrices for hidden Markov models , 1999, IEEE Trans. Speech Audio Process..
[10] Cheung-Chi Leung,et al. Joint acoustic modeling of triphones and trigraphemes by multi-task learning deep neural networks for low-resource speech recognition , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[11] Andreas Stolcke,et al. SRILM - an extensible language modeling toolkit , 2002, INTERSPEECH.
[12] Daniel Povey,et al. The Kaldi Speech Recognition Toolkit , 2011 .
[13] Hervé Bourlard,et al. Connectionist Speech Recognition: A Hybrid Approach , 1993 .
[14] Patrick Kenny,et al. Front-End Factor Analysis for Speaker Verification , 2011, IEEE Transactions on Audio, Speech, and Language Processing.
[15] Simon King,et al. Deep neural networks employing Multi-Task Learning and stacked bottleneck features for speech synthesis , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[16] David Malah,et al. Speech enhancement using a minimum mean-square error log-spectral amplitude estimator , 1984, IEEE Trans. Acoust. Speech Signal Process..
[17] Marcel Gabrea. Robust adaptive Kalman filtering-based speech enhancement algorithm , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.
[18] Daniel P. W. Ellis,et al. Tandem connectionist feature extraction for conventional HMM systems , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).
[19] Jan Cernocký,et al. Probabilistic and Bottle-Neck Features for LVCSR of Meetings , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.
[20] Lemmietta G. McNeilly,et al. American Speech-Language-Hearing Association (ASHA) , 2017 .
[21] Jasha Droppo,et al. Multi-task learning in deep neural networks for improved phoneme recognition , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.
[22] T. Hirahara,et al. One, two, many—Judging the number of concurrent talkers. , 1996 .
[23] Li-Rong Dai,et al. A Regression Approach to Speech Enhancement Based on Deep Neural Networks , 2015, IEEE/ACM Transactions on Audio, Speech, and Language Processing.
[24] George Saon,et al. Speaker adaptation of neural network acoustic models using i-vectors , 2013, 2013 IEEE Workshop on Automatic Speech Recognition and Understanding.
[25] Tran Huy Dat,et al. Generalized Gaussian Distribution Kullback-Leibler kernel for robust sound event recognition , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[26] Richard F. Lyon,et al. Machine Hearing: An Emerging Field , 2010 .
[27] Ning Ma,et al. The PASCAL CHiME speech separation and recognition challenge , 2013, Comput. Speech Lang..
[28] Georg Heigold,et al. Multilingual acoustic models using distributed deep neural networks , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.
[29] Richard M. Schwartz,et al. A compact model for speaker-adaptive training , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.
[30] Yan Song,et al. Robust sound event recognition using convolutional neural networks , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[31] Julien Pinquier,et al. A multi-band spectral subtraction method for enhancing speech corrupted by colored noise , 2002 .
[32] Hitoshi Isahara,et al. Spontaneous Speech Corpus of Japanese , 2000, LREC.
[33] Rainer Martin,et al. Speech enhancement using MMSE short time spectral estimation with gamma distributed speech priors , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.
[34] Dimitrios Dimitriadis,et al. Investigating factor analysis features for deep neural networks in noisy speech recognition , 2015, INTERSPEECH.
[35] Heikki Huttunen,et al. Polyphonic sound event detection using multi label deep neural networks , 2015, 2015 International Joint Conference on Neural Networks (IJCNN).
[36] Tomohiro Nakatani,et al. The reverb challenge: A common evaluation framework for dereverberation and recognition of reverberant speech , 2013, 2013 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics.
[37] S. Khanna. Hearing and balance , 1997, Nature.
[38] Patrick Kenny,et al. Eigenvoice modeling with sparse training data , 2005, IEEE Transactions on Speech and Audio Processing.
[39] Ramesh A. Gopinath,et al. Maximum likelihood modeling with Gaussian distributions for classification , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).
[40] R. Maas,et al. A summary of the REVERB challenge: state-of-the-art and remaining challenges in reverberant speech processing research , 2016, EURASIP Journal on Advances in Signal Processing.
[41] Andrew W. Senior,et al. Improving DNN speaker independence with I-vector inputs , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[42] Jon Barker,et al. The third ‘CHiME’ speech separation and recognition challenge: Dataset, task and baselines , 2015, 2015 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU).