Supervised Speech Separation Based on Deep Learning: An Overview
暂无分享,去创建一个
[1] Alex Waibel,et al. Noise reduction using connectionist models , 1988, ICASSP-88., International Conference on Acoustics, Speech, and Signal Processing.
[2] Jun Du,et al. Hierarchical deep neural network for multivariate regression , 2017, Pattern Recognit..
[3] Patrick A. Naylor,et al. Speech Dereverberation , 2010 .
[4] Oldooz Hazrati,et al. Blind binary masking for reverberation suppression in cochlear implants. , 2013, The Journal of the Acoustical Society of America.
[5] Ruth Y Litovsky,et al. Effect of masker type and age on speech intelligibility and spatial release from masking in children and adults. , 2006, The Journal of the Acoustical Society of America.
[6] Richard M. Stern,et al. An analysis of binaural spectro-temporal masking as nonlinear beamforming , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[7] Paris Smaragdis,et al. Adaptive Denoising Autoencoders: A Fine-Tuning Scheme to Learn from Test Mixtures , 2015, LVA/ICA.
[8] DeLiang Wang,et al. On Ideal Binary Mask As the Computational Goal of Auditory Scene Analysis , 2005, Speech Separation by Humans and Machines.
[9] Jürgen Schmidhuber,et al. Deep learning in neural networks: An overview , 2014, Neural Networks.
[10] DeLiang Wang,et al. Isolating the energetic component of speech-on-speech masking with ideal time-frequency segregation. , 2006, The Journal of the Acoustical Society of America.
[11] DeLiang Wang,et al. An algorithm to improve speech recognition in noise for hearing-impaired listeners. , 2013, The Journal of the Acoustical Society of America.
[12] Yu Tsao,et al. Raw waveform-based speech enhancement by fully convolutional networks , 2017, 2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC).
[13] DeLiang Wang,et al. A Tandem Algorithm for Pitch Estimation and Voiced Speech Segregation , 2010, IEEE Transactions on Audio, Speech, and Language Processing.
[14] Mike Brookes,et al. Mask-based enhancement for very low quality speech , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[15] Richard M. Stern,et al. Nonlinear enhancement of onset for robust speech recognition , 2010, INTERSPEECH.
[16] E. C. Cmm,et al. on the Recognition of Speech, with , 2008 .
[17] Takuya Yoshioka,et al. Robust MVDR beamforming using time-frequency masks for online/offline ASR in noise , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[18] Jesper Jensen,et al. Speech Intelligibility Potential of General and Specialized Deep Neural Network Based Speech Enhancement Systems , 2017, IEEE/ACM Transactions on Audio, Speech, and Language Processing.
[19] DeLiang Wang,et al. DNN Based Mask Estimation for Supervised Speech Separation , 2018 .
[20] Yoshua Bengio,et al. Scaling learning algorithms towards AI , 2007 .
[21] Bhiksha Raj,et al. Active-Set Newton Algorithm for Overcomplete Non-Negative Representations of Audio , 2013, IEEE Transactions on Audio, Speech, and Language Processing.
[22] Philipos C. Loizou,et al. Speech Enhancement: Theory and Practice , 2007 .
[23] Lpas Vannoorden,et al. Temporal coherence in the perception of tone sequences [doctoral dissertation , 1975 .
[24] Tara N. Sainath,et al. Deep Neural Networks for Acoustic Modeling in Speech Recognition , 2012 .
[25] E. Owens,et al. An Introduction to the Psychology of Hearing , 1997 .
[26] DeLiang Wang,et al. Speech segregation based on sound localization , 2001, IJCNN'01. International Joint Conference on Neural Networks. Proceedings (Cat. No.01CH37222).
[27] P J Webros. BACKPROPAGATION THROUGH TIME: WHAT IT DOES AND HOW TO DO IT , 1990 .
[28] D. Wang,et al. The time dimension for scene analysis , 2005, IEEE Transactions on Neural Networks.
[29] Paul J. Werbos,et al. Backpropagation Through Time: What It Does and How to Do It , 1990, Proc. IEEE.
[30] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[31] Hynek Hermansky,et al. RASTA processing of speech , 1994, IEEE Trans. Speech Audio Process..
[32] Jonathan Le Roux,et al. Phase-sensitive and recognition-boosted speech separation using deep recurrent neural networks , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[33] Erkki Oja,et al. Independent component analysis: algorithms and applications , 2000, Neural Networks.
[34] DeLiang Wang,et al. Robust speaker identification using auditory features and computational auditory scene analysis , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.
[35] O. L. Frost,et al. An algorithm for linearly constrained adaptive array processing , 1972 .
[36] Hsiao-Chuan Wang,et al. Robust features for noisy speech recognition based on temporal trajectory filtering of short-time autocorrelation sequences , 1999, Speech Commun..
[37] J. Blauert. Spatial Hearing: The Psychophysics of Human Sound Localization , 1983 .
[38] David H. Wolpert,et al. The Lack of A Priori Distinctions Between Learning Algorithms , 1996, Neural Computation.
[39] Hui Zhang,et al. Deep stacking networks with time series for speech separation , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[40] Jun Du,et al. Multi-objective learning and mask-based post-processing for deep neural network based speech enhancement , 2017, INTERSPEECH.
[41] WangDeLiang,et al. A deep ensemble learning method for monaural speech separation , 2016 .
[42] Antonio Bonafonte,et al. SEGAN: Speech Enhancement Generative Adversarial Network , 2017, INTERSPEECH.
[43] B.D. Van Veen,et al. Beamforming: a versatile approach to spatial filtering , 1988, IEEE ASSP Magazine.
[44] Marco Matassoni,et al. An auditory based modulation spectral feature for reverberant speech recognition , 2010, INTERSPEECH.
[45] C. Darwin. Auditory grouping , 1997, Trends in Cognitive Sciences.
[46] Michael S. Brandstein,et al. Microphone Arrays - Signal Processing Techniques and Applications , 2001, Microphone Arrays.
[47] Yoshua Bengio,et al. Generative Adversarial Nets , 2014, NIPS.
[48] B. Moore. An introduction to the psychology of hearing, 3rd ed. , 1989 .
[49] Chin-Hui Lee,et al. A Reverberation-Time-Aware Approach to Speech Dereverberation Based on Deep Neural Networks , 2017, IEEE/ACM Transactions on Audio, Speech, and Language Processing.
[50] Scott Rickard,et al. Blind separation of speech mixtures via time-frequency masking , 2004, IEEE Transactions on Signal Processing.
[51] C. Lam,et al. Musician Enhancement for Speech-In-Noise , 2009, Ear and hearing.
[52] Yang Yu,et al. Localization based stereo speech source separation using probabilistic time-frequency masking and deep neural networks , 2016, EURASIP J. Audio Speech Music. Process..
[53] Yonggang Hu,et al. Perceptual improvement of deep neural networks for monaural speech enhancement , 2016, 2016 IEEE International Workshop on Acoustic Signal Enhancement (IWAENC).
[54] H Hermansky,et al. Perceptual linear predictive (PLP) analysis of speech. , 1990, The Journal of the Acoustical Society of America.
[55] B. Kollmeier,et al. Spectro-temporal modulation subspace-spanning filter bank features for robust automatic speech recognition. , 2012, The Journal of the Acoustical Society of America.
[56] Yi Jiang,et al. Binaural Classification for Reverberant Speech Segregation Using Deep Neural Networks , 2014, IEEE/ACM Transactions on Audio, Speech, and Language Processing.
[57] Jun Du,et al. A regression approach to binaural speech segregation via deep neural network , 2016, 2016 10th International Symposium on Chinese Spoken Language Processing (ISCSLP).
[58] Jesper Jensen,et al. Permutation invariant training of deep models for speaker-independent multi-talker speech separation , 2016, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[59] NeyHermann,et al. From feedforward to recurrent LSTM neural networks for language modeling , 2015 .
[60] DeLiang Wang,et al. Noise perturbation for supervised speech separation , 2016, Speech Commun..
[61] Ying-Fang Kao,et al. Human and Machine Learning , 2018 .
[62] Lawrence D. Jackel,et al. Backpropagation Applied to Handwritten Zip Code Recognition , 1989, Neural Computation.
[63] Zheng-Hua Tan,et al. Conditional Generative Adversarial Networks for Speech Enhancement and Noise-Robust Speaker Verification , 2017, INTERSPEECH.
[64] J. Licklider,et al. A duplex theory of pitch perception , 1951, Experientia.
[65] DeLiang Wang,et al. A classification based approach to speech segregation. , 2012, The Journal of the Acoustical Society of America.
[66] DeLiang Wang,et al. Speech segregation based on pitch tracking and amplitude modulation , 2001, Proceedings of the 2001 IEEE Workshop on the Applications of Signal Processing to Audio and Acoustics (Cat. No.01TH8575).
[67] Björn W. Schuller,et al. Speech Enhancement with LSTM Recurrent Neural Networks and its Application to Noise-Robust ASR , 2015, LVA/ICA.
[68] Jesper Jensen,et al. An Algorithm for Intelligibility Prediction of Time–Frequency Weighted Noisy Speech , 2011, IEEE Transactions on Audio, Speech, and Language Processing.
[69] Masakiyo Fujimoto,et al. Exploring multi-channel features for denoising-autoencoder-based speech enhancement , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[70] Xiaofei Wang,et al. Oracle performance investigation of the ideal masks , 2016, 2016 IEEE International Workshop on Acoustic Signal Enhancement (IWAENC).
[71] Paris Smaragdis,et al. Joint Optimization of Masks and Deep Recurrent Neural Networks for Monaural Source Separation , 2015, IEEE/ACM Transactions on Audio, Speech, and Language Processing.
[72] Mike Brookes,et al. SOBM - a binary mask for noisy speech that optimises an objective intelligibility metric , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[73] B. Shinn-Cunningham. Object-based auditory and visual attention , 2008, Trends in Cognitive Sciences.
[74] Jun Du,et al. Speech separation based on improved deep neural networks with dual outputs of speech features for both target and interfering speakers , 2014, The 9th International Symposium on Chinese Spoken Language Processing.
[75] Li-Rong Dai,et al. A Regression Approach to Speech Enhancement Based on Deep Neural Networks , 2015, IEEE/ACM Transactions on Audio, Speech, and Language Processing.
[76] Guy J. Brown,et al. Multiple F0 Estimation , 2006 .
[77] Richard F. Lyon,et al. Human and Machine Hearing: Extracting Meaning from Sound , 2017 .
[78] DeLiang Wang,et al. An algorithm to increase speech intelligibility for hearing-impaired listeners in novel segments of the same noise type. , 2015, The Journal of the Acoustical Society of America.
[79] Hiroshi Sawada,et al. Underdetermined blind sparse source separation for arbitrarily arranged multiple sensors , 2007, Signal Process..
[80] Jun Du,et al. Multiple-target deep learning for LSTM-RNN based speech enhancement , 2017, 2017 Hands-free Speech Communications and Microphone Arrays (HSCMA).
[81] Hao Li,et al. Using optimal ratio mask as training target for supervised speech separation , 2017, 2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC).
[82] Chng Eng Siong,et al. On time-frequency mask estimation for MVDR beamforming with application in robust speech recognition , 2017, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[83] Pejman Mowlaee,et al. Phase Estimation in Single-Channel Speech Enhancement: Limits-Potential , 2015, IEEE/ACM Transactions on Audio, Speech, and Language Processing.
[84] Jesper Jensen,et al. Fast noise PSD estimation with low complexity , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.
[85] Jonathan Le Roux,et al. Deep NMF for speech separation , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[86] R. Plomp,et al. Effects of fluctuating noise and interfering speech on the speech-reception threshold for impaired and normal hearing. , 1990, The Journal of the Acoustical Society of America.
[87] Jonathan Le Roux,et al. Phase Processing for Single-Channel Speech Enhancement: History and recent advances , 2015, IEEE Signal Processing Magazine.
[88] Lauren Calandruccio,et al. Determination of the Potential Benefit of Time-Frequency Gain Manipulation , 2006, Ear and hearing.
[89] DeLiang Wang,et al. Complex Ratio Masking for Monaural Speech Separation , 2016, IEEE/ACM Transactions on Audio, Speech, and Language Processing.
[90] Hermann Ney,et al. From Feedforward to Recurrent LSTM Neural Networks for Language Modeling , 2015, IEEE/ACM Transactions on Audio, Speech, and Language Processing.
[91] Chung-Hsien Wu,et al. Fully complex deep neural network for phase-incorporating monaural source separation , 2017, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[92] S. Shamma,et al. Temporal coherence and attention in auditory scene analysis , 2011, Trends in Neurosciences.
[93] Xiao-Lei Zhang,et al. Deep Belief Networks Based Voice Activity Detection , 2013, IEEE Transactions on Audio, Speech, and Language Processing.
[94] DeLiang Wang,et al. Monaural speech segregation based on pitch tracking and amplitude modulation , 2002, IEEE Transactions on Neural Networks.
[95] Deliang Wang,et al. Role of mask pattern in intelligibility of ideal binary-masked noisy speech. , 2009, The Journal of the Acoustical Society of America.
[96] DeLiang Wang,et al. Learning spectral mapping for speech dereverberation , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[97] Dong Yu,et al. Multitalker Speech Separation With Utterance-Level Permutation Invariant Training of Deep Recurrent Neural Networks , 2017, IEEE/ACM Transactions on Audio, Speech, and Language Processing.
[98] Tim Brookes,et al. On the Ideal Ratio Mask as the Goal of Computational Auditory Scene Analysis , 2014 .
[99] DeLiang Wang,et al. On Training Targets for Supervised Speech Separation , 2014, IEEE/ACM Transactions on Audio, Speech, and Language Processing.
[100] Peter F. Assmann,et al. The Perception of Speech Under Adverse Conditions , 2004 .
[101] Yang Lu,et al. An algorithm that improves speech intelligibility in noise for normal-hearing listeners. , 2009, The Journal of the Acoustical Society of America.
[102] Razvan Pascanu,et al. On the difficulty of training recurrent neural networks , 2012, ICML.
[103] DeLiang Wang,et al. A Feature Study for Classification-Based Speech Separation at Low Signal-to-Noise Ratios , 2014, IEEE/ACM Transactions on Audio, Speech, and Language Processing.
[104] Tim Pring,et al. Speech perception in noise by monolingual, bilingual and trilingual listeners. , 2010, International journal of language & communication disorders.
[105] Philipos C. Loizou,et al. Reasons why Current Speech-Enhancement Algorithms do not Improve Speech Intelligibility and Suggested Solutions , 2011, IEEE Transactions on Audio, Speech, and Language Processing.
[106] C J Darwin,et al. Listening to speech in the presence of other sounds , 2008, Philosophical Transactions of the Royal Society B: Biological Sciences.
[107] Andrew J. Oxenham,et al. Sequential stream segregation of voiced and unvoiced speech sounds based on fundamental frequency , 2017, Hearing Research.
[108] Jun Du,et al. Dynamic noise aware training for speech enhancement based on deep neural networks , 2014, INTERSPEECH.
[109] Liang He,et al. Convolutional maxout neural networks for speech separation , 2015, 2015 IEEE International Symposium on Signal Processing and Information Technology (ISSPIT).
[110] Emily Buss,et al. Spondee Recognition in a Two-Talker Masker and a Speech-Shaped Noise Masker in Adults and Children , 2002, Ear and hearing.
[111] Nicolas Grimault,et al. The Relationship Between Concurrent Speech Segregation, Pitch-Based Streaming of Vowel Sequences, and Frequency Selectivity , 2012 .
[112] Hervé Bourlard,et al. Phase autocorrelation (PAC) derived robust speech features , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..
[113] DeLiang Wang,et al. Features for Masking-Based Monaural Speech Separation in Reverberant Conditions , 2017, IEEE/ACM Transactions on Audio, Speech, and Language Processing.
[114] Christopher J Rozell,et al. Outcome measures based on classification performance fail to predict the intelligibility of binary-masked speech. , 2016, The Journal of the Acoustical Society of America.
[115] Haizhou Li,et al. Speech dereverberation for enhancement and recognition using dynamic features constrained deep neural networks and feature adaptation , 2016, EURASIP J. Adv. Signal Process..
[116] Anders Krogh,et al. Introduction to the theory of neural computation , 1994, The advanced book program.
[117] DeLiang Wang,et al. A speech enhancement algorithm by iterating single- and multi-microphone processing and its application to robust ASR , 2017, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[118] Richard M. Stern,et al. A Bayesian classifier for spectrographic mask estimation for missing feature speech recognition , 2004, Speech Commun..
[119] Jürgen Schmidhuber,et al. Multi-column deep neural networks for image classification , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.
[120] Jayaganesh Swaminathan,et al. Determining the energetic and informational components of speech-on-speech masking , 2016, The Journal of the Acoustical Society of America.
[121] Ming Tu,et al. Speech enhancement based on Deep Neural Networks with skip connections , 2017, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[122] Jun Du,et al. A Maximum Likelihood Approach to Deep Neural Network Based Nonlinear Spectral Mapping for Single-Channel Speech Separation , 2017, INTERSPEECH.
[123] Jeff A. Bilmes,et al. MVA Processing of Speech Features , 2007, IEEE Transactions on Audio, Speech, and Language Processing.
[124] D. Wang,et al. Computational Auditory Scene Analysis: Principles, Algorithms, and Applications , 2008, IEEE Trans. Neural Networks.
[125] DeLiang Wang,et al. A two-stage algorithm for one-microphone reverberant speech enhancement , 2006, IEEE Transactions on Audio, Speech, and Language Processing.
[126] Josh H. McDermott,et al. Sound Segregation via Embedded Repetition Is Robust to Inattention , 2015, Journal of experimental psychology. Human perception and performance.
[127] Franz Pernkopf,et al. DNN-based speech mask estimation for eigenvector beamforming , 2017, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[128] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.
[129] G. A. Miller,et al. The Trill Threshold , 1950 .
[130] Jonathan Le Roux,et al. Improved MVDR Beamforming Using Single-Channel Mask Prediction Networks , 2016, INTERSPEECH.
[131] Richard M. Stern,et al. Delta-spectral cepstral coefficients for robust speech recognition , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[132] Michael I. Jordan,et al. Learning Spectral Clustering, With Application To Speech Separation , 2006, J. Mach. Learn. Res..
[133] Shrikanth S. Narayanan,et al. Long-Term SNR Estimation of Speech Signals in Known and Unknown Channel Conditions , 2016, IEEE/ACM Transactions on Audio, Speech, and Language Processing.
[134] Kuldip K. Paliwal,et al. Feature extraction from higher-lag autocorrelation coefficients for robust speech recognition , 2006, Speech Commun..
[135] Wei Jiang,et al. The optimal ratio time-frequency mask for speech separation in terms of the signal-to-noise ratio. , 2013, The Journal of the Acoustical Society of America.
[136] Richard F. Lyon,et al. Trainable frontend for robust and far-field keyword spotting , 2016, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[137] DeLiang Wang,et al. A Deep Ensemble Learning Method for Monaural Speech Separation , 2016, IEEE/ACM Transactions on Audio, Speech, and Language Processing.
[138] Tomohiro Nakatani,et al. Integrating DNN-based and spatial clustering-based mask estimation for robust MVDR beamforming , 2017, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[139] G. A. Miller,et al. The masking of speech. , 1947, Psychological bulletin.
[140] M R Leek,et al. FO processing and the separation of competing speech signals by listeners with normal hearing and with hearing loss. , 1998, Journal of speech, language, and hearing research : JSLHR.
[141] Jun Du,et al. SNR-Based Progressive Learning of Deep Neural Network for Speech Enhancement , 2016, INTERSPEECH.
[142] Rémi Gribonval,et al. Performance measurement in blind audio source separation , 2006, IEEE Transactions on Audio, Speech, and Language Processing.
[143] DeLiang Wang,et al. An algorithm to increase intelligibility for hearing-impaired listeners in the presence of a competing talker. , 2017, The Journal of the Acoustical Society of America.
[144] L. V. Noorden. Temporal coherence in the perception of tone sequences , 1975 .
[145] DeLiang Wang,et al. Boosting Contextual Information for Deep Neural Network Based Voice Activity Detection , 2016, IEEE/ACM Transactions on Audio, Speech, and Language Processing.
[146] William M. Hartmann,et al. How we localize sound , 1999 .
[147] Jun Du,et al. An Experimental Study on Speech Enhancement Based on Deep Neural Networks , 2014, IEEE Signal Processing Letters.
[148] Sharon Gannot,et al. A phoneme-based pre-training approach for deep neural network with application to speech enhancement , 2016, 2016 IEEE International Workshop on Acoustic Signal Enhancement (IWAENC).
[149] Jesper Jensen,et al. An Algorithm for Predicting the Intelligibility of Speech Masked by Modulated Noise Maskers , 2016, IEEE/ACM Transactions on Audio, Speech, and Language Processing.
[150] Emanuel A. P. Habets,et al. Theory and Applications of Spherical Microphone Array Processing , 2016 .
[151] John R. Hershey,et al. Deep long short-term memory adaptive beamforming networks for multichannel robust speech recognition , 2017, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[152] Stephen McAdams,et al. Schema-based processing in auditory scene analysis , 2002, Perception & psychophysics.
[153] DeLiang Wang,et al. Ideal ratio mask estimation using deep neural networks for robust speech recognition , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.
[154] DeLiang Wang,et al. A Supervised Learning Approach to Monaural Segregation of Reverberant Speech , 2007, IEEE Transactions on Audio, Speech, and Language Processing.
[155] Rhee Man Kil,et al. Auditory processing of speech signals for robust speech recognition in real-world noisy environments , 1999, IEEE Trans. Speech Audio Process..
[156] Feng Huang,et al. Pitch Estimation in Noisy Speech Using Accumulated Peak Spectrum and Sparse Estimation Technique , 2013, IEEE Transactions on Audio, Speech, and Language Processing.
[157] Peter Norvig,et al. Artificial Intelligence: A Modern Approach , 1995 .
[158] DeLiang Wang,et al. Speech intelligibility in background noise with ideal binary time-frequency masking. , 2009, The Journal of the Acoustical Society of America.
[159] DeLiang Wang,et al. Binary and ratio time-frequency masks for robust speech recognition , 2006, Speech Commun..
[160] DeLiang Wang,et al. Towards Scaling Up Classification-Based Speech Separation , 2013, IEEE Transactions on Audio, Speech, and Language Processing.
[161] DeLiang Wang,et al. A deep neural network for time-domain signal reconstruction , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[162] S. Boll,et al. Suppression of acoustic noise in speech using spectral subtraction , 1979 .
[163] Paris Smaragdis,et al. Deep learning for monaural speech separation , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[164] DeLiang Wang,et al. Deep learning reinvents the hearing aid , 2017, IEEE Spectrum.
[165] Jessica M. Foxton,et al. Effects of attention and unilateral neglect on auditory stream segregation. , 2001, Journal of experimental psychology. Human perception and performance.
[166] Chengzhu Yu,et al. The NTT CHiME-3 system: Advances in speech enhancement and recognition for mobile multi-microphone devices , 2015, 2015 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU).
[167] Geoffrey E. Hinton,et al. Learning internal representations by error propagation , 1986 .
[168] DeLiang Wang,et al. Exploring Monaural Features for Classification-Based Speech Segregation , 2013, IEEE Transactions on Audio, Speech, and Language Processing.
[169] DeLiang Wang,et al. Long Short-Term Memory for Speaker Generalization in Supervised Speech Separation , 2016, INTERSPEECH.
[170] Jun Du,et al. A Regression Approach to Single-Channel Speech Separation Via High-Resolution Deep Neural Networks , 2016, IEEE/ACM Transactions on Audio, Speech, and Language Processing.
[171] DeLiang Wang,et al. A two-stage algorithm for noisy and reverberant speech enhancement , 2017, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[172] DeLiang Wang,et al. Deep Learning Based Binaural Speech Separation in Reverberant Environments , 2017, IEEE/ACM Transactions on Audio, Speech, and Language Processing.
[173] DeLiang Wang,et al. Boosting Classification Based Speech Separation Using Temporal Dynamics , 2012, INTERSPEECH.
[174] Boaz Rafaely,et al. Microphone Array Signal Processing , 2008 .
[175] DeLiang Wang,et al. An Unsupervised Approach to Cochannel Speech Separation , 2013, IEEE Transactions on Audio, Speech, and Language Processing.
[176] Richard F. Lyon. A computational model of binaural localization and separation , 1983, ICASSP.
[177] Jun Du,et al. Speech separation of a target speaker based on deep neural networks , 2014, 2014 12th International Conference on Signal Processing (ICSP).
[178] Jürgen Schmidhuber,et al. Highway Networks , 2015, ArXiv.
[179] Pejman Mowlaee Begzade Mahale,et al. Phase Estimation in Single Channel Speech Enhancement Using Phase Decomposition , 2015, IEEE Signal Processing Letters.
[180] Zhong-Qiu Wang,et al. Phoneme-specific speech separation , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[181] Quoc V. Le,et al. Sequence to Sequence Learning with Neural Networks , 2014, NIPS.
[182] Guoning Hu,et al. Monaural speech segregation based on pitch tracking and amplitude modulation , 2002, ICASSP.
[183] DeLiang Wang,et al. Long short-term memory for speaker generalization in supervised speech separation. , 2017, The Journal of the Acoustical Society of America.
[184] WangDeLiang,et al. Towards Scaling Up Classification-Based Speech Separation , 2013 .
[185] Terrence J. Sejnowski,et al. An Information-Maximization Approach to Blind Separation and Blind Deconvolution , 1995, Neural Computation.
[186] Geoffrey E. Hinton,et al. Rectified Linear Units Improve Restricted Boltzmann Machines , 2010, ICML.
[187] Zhuo Chen,et al. Deep clustering: Discriminative embeddings for segmentation and separation , 2015, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[188] Nima Mesgarani,et al. Deep attractor network for single-microphone speaker separation , 2016, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[189] P. Loizou,et al. Factors influencing intelligibility of ideal binary-masked speech: implications for noise reduction. , 2008, The Journal of the Acoustical Society of America.
[190] Jon Barker,et al. The third ‘CHiME’ speech separation and recognition challenge: Dataset, task and baselines , 2015, 2015 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU).
[191] Hideki Kashioka,et al. Speech restoration based on deep learning autoencoder with layer-wised pretraining , 2012, INTERSPEECH.
[192] DeLiang Wang,et al. Cocktail Party Processing via Structured Prediction , 2012, NIPS.
[193] Emmanuel Vincent,et al. Multichannel Audio Source Separation With Deep Neural Networks , 2016, IEEE/ACM Transactions on Audio, Speech, and Language Processing.
[194] R. W. Hukin,et al. Effectiveness of spatial cues, prosody, and talker characteristics in selective attention. , 2000, The Journal of the Acoustical Society of America.
[195] DeLiang Wang,et al. Large-scale training to increase speech intelligibility for hearing-impaired listeners in novel noises. , 2016, The Journal of the Acoustical Society of America.
[196] Richard M. Stern,et al. Power-Normalized Cepstral Coefficients (PNCC) for Robust Speech Recognition , 2016, IEEE/ACM Transactions on Audio, Speech, and Language Processing.
[197] Reinhold Häb-Umbach,et al. Neural network based spectral mask estimation for acoustic beamforming , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[198] DeLiang Wang,et al. Speaker-dependent multipitch tracking using deep neural networks , 2017, INTERSPEECH.
[199] Yu Tsao,et al. SNR-Aware Convolutional Neural Network Modeling for Speech Enhancement , 2016, INTERSPEECH.
[200] Dong Yu,et al. Multitalker Speech Separation With Utterance-Level Permutation Invariant Training of Deep Recurrent Neural Networks , 2017, IEEE/ACM Transactions on Audio, Speech, and Language Processing.
[201] Hynek Hermansky,et al. Study on the dereverberation of speech based on temporal envelope filtering , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.
[202] DeLiang Wang,et al. Time-Frequency Masking for Speech Separation and Its Potential for Hearing Aid Design , 2008 .
[203] Björn W. Schuller,et al. Discriminatively trained recurrent neural networks for single-channel speech separation , 2014, 2014 IEEE Global Conference on Signal and Information Processing (GlobalSIP).
[204] DeLiang Wang,et al. Neural Network Based Pitch Tracking in Very Noisy Speech , 2014, IEEE/ACM Transactions on Audio, Speech, and Language Processing.
[205] Ephraim. Speech enhancement using a minimum mean square error short-time spectral amplitude estimator , 1984 .
[206] Yee Whye Teh,et al. A Fast Learning Algorithm for Deep Belief Nets , 2006, Neural Computation.
[207] Jinwon Lee,et al. A Fully Convolutional Neural Network for Speech Enhancement , 2016, INTERSPEECH.
[208] Chng Eng Siong,et al. Combining non-negative matrix factorization and deep neural networks for speech enhancement and automatic speech recognition , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[209] Tao Zhang,et al. Learning Spectral Mapping for Speech Dereverberation and Denoising , 2015, IEEE/ACM Transactions on Audio, Speech, and Language Processing.
[210] J. Mccroskey,et al. Human Communication , 2008 .
[211] San Cristóbal Mateo,et al. The Lack of A Priori Distinctions Between Learning Algorithms , 1996 .
[212] Tara N. Sainath,et al. Neural Network Adaptive Beamforming for Robust Multichannel Speech Recognition , 2016, INTERSPEECH.
[213] Yu Tsao,et al. Speech enhancement based on deep denoising autoencoder , 2013, INTERSPEECH.
[214] Hui Zhang,et al. Multi-Target Ensemble Learning for Monaural Speech Separation , 2017, INTERSPEECH.
[215] Andries P. Hekstra,et al. Perceptual evaluation of speech quality (PESQ)-a new method for speech quality assessment of telephone networks and codecs , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).
[216] Hui Zhang,et al. A Pairwise Algorithm Using the Deep Stacking Network for Speech Separation and Pitch Estimation , 2016, IEEE/ACM Transactions on Audio, Speech, and Language Processing.
[217] Jun Du,et al. A Gender Mixture Detection Approach to Unsupervised Single-Channel Speech Separation Based on Deep Neural Networks , 2017, IEEE/ACM Transactions on Audio, Speech, and Language Processing.
[218] Jesper Jensen,et al. MMSE based noise PSD tracking with low complexity , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.