Exploiting time-frequency patterns with LSTM-RNNs for low-bitrate audio restoration
暂无分享,去创建一个
Björn Schuller | Florian Eyben | Dagmar Schuller | Eunmi Oh | Jun Deng | Holly Francois | Zixing Zhang | Dagmar M. Schuller | Björn Schuller | F. Eyben | Zixing Zhang | Jun Deng | Eunmi L. Oh | Holly Francois
[1] Jürgen Schmidhuber,et al. Multi-dimensional Recurrent Neural Networks , 2007, ICANN.
[2] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.
[3] Yun Lei,et al. Application of convolutional neural networks to speaker recognition in noisy conditions , 2014, INTERSPEECH.
[4] Karlheinz Brandenburg,et al. MP3 and AAC Explained , 1999 .
[5] Ted Painter,et al. Audio Signal Processing and Coding , 2007 .
[6] Sergey Ioffe,et al. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.
[7] Erik Marchi,et al. Sparse Autoencoder-Based Feature Transfer Learning for Speech Emotion Recognition , 2013, 2013 Humaine Association Conference on Affective Computing and Intelligent Interaction.
[8] Alex Graves,et al. Supervised Sequence Labelling with Recurrent Neural Networks , 2012, Studies in Computational Intelligence.
[9] Peyman Abbaszadeh. Improving Hydrological Process Modeling Using Optimized Threshold-Based Wavelet De-Noising Technique , 2016, Water Resources Management.
[10] Cha Zhang,et al. CROWDMOS: An approach for crowdsourcing mean opinion score studies , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[11] P J Webros. BACKPROPAGATION THROUGH TIME: WHAT IT DOES AND HOW TO DO IT , 1990 .
[12] Colin Raffel,et al. librosa: Audio and Music Signal Analysis in Python , 2015, SciPy.
[13] Jacob Benesty,et al. Spectral Enhancement Methods , 2009 .
[14] A. Spanias,et al. Perceptual coding of digital audio , 2000, Proceedings of the IEEE.
[15] Chong Wang,et al. Deep Speech 2 : End-to-End Speech Recognition in English and Mandarin , 2015, ICML.
[16] Quoc V. Le,et al. Recurrent Neural Networks for Noise Reduction in Robust ASR , 2012, INTERSPEECH.
[17] Björn W. Schuller,et al. Deep neural networks for anger detection from real life speech data , 2017, 2017 Seventh International Conference on Affective Computing and Intelligent Interaction Workshops and Demos (ACIIW).
[18] Jürgen Schmidhuber,et al. Parallel Multi-Dimensional LSTM, With Application to Fast Biomedical Volumetric Image Segmentation , 2015, NIPS.
[19] Lianhong Cai,et al. Spectro-Temporal Modelling with Time-Frequency LSTM and Structured Output Layer for Voice Conversion , 2017, INTERSPEECH.
[20] George Saon,et al. The IBM 2016 English Conversational Telephone Speech Recognition System , 2016, INTERSPEECH.
[21] Sascha Disch,et al. A harmonic bandwidth extension method for audio codecs , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.
[22] Adam Coates,et al. Deep Voice: Real-time Neural Text-to-Speech , 2017, ICML.
[23] Vincent Dumoulin,et al. Deconvolution and Checkerboard Artifacts , 2016 .
[24] Chin-Hui Lee,et al. DNN-based speech bandwidth expansion and its application to adding high-frequency missing features for automatic speech recognition of narrowband speech , 2015, INTERSPEECH.
[25] Ronaldus Maria Aarts,et al. Audio Bandwidth Extension: Application of Psychoacoustics, Signal Processing and Loudspeaker Design , 2004 .
[26] Andrew W. Senior,et al. Long short-term memory recurrent neural network architectures for large scale acoustic modeling , 2014, INTERSPEECH.
[27] Heiga Zen,et al. WaveNet: A Generative Model for Raw Audio , 2016, SSW.
[28] Vivienne Sze,et al. Efficient Processing of Deep Neural Networks: A Tutorial and Survey , 2017, Proceedings of the IEEE.
[29] Leszek Morzyński,et al. Application of Neural Networks in Active Noise Reduction Systems , 2003, International journal of occupational safety and ergonomics : JOSE.
[30] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.
[31] Luca Antiga,et al. Automatic differentiation in PyTorch , 2017 .
[32] Hassan Khotanlou,et al. An empirical technique for predicting noise exposure level in the typical embroidery workrooms using artificial neural networks , 2013 .
[33] A. Gray,et al. Distance measures for speech processing , 1976 .
[34] Geoffrey Zweig,et al. LSTM time and frequency recurrence for automatic speech recognition , 2015, 2015 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU).
[35] Kristofer Kjörling,et al. Spectral Band Replication, a Novel Approach in Audio Coding , 2002 .
[36] Geoffrey Zweig,et al. Exploring multidimensional lstms for large vocabulary ASR , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[37] Brian Kingsbury,et al. Very deep multilingual convolutional neural networks for LVCSR , 2015, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[38] Geoffrey E. Hinton,et al. Rectified Linear Units Improve Restricted Boltzmann Machines , 2010, ICML.
[39] Geoffrey E. Hinton,et al. Reducing the Dimensionality of Data with Neural Networks , 2006, Science.
[40] Kuldip K. Paliwal,et al. Bidirectional recurrent neural networks , 1997, IEEE Trans. Signal Process..
[41] Thomas Brox,et al. U-Net: Convolutional Networks for Biomedical Image Segmentation , 2015, MICCAI.
[42] Björn W. Schuller,et al. Universum Autoencoder-Based Domain Adaptation for Speech Emotion Recognition , 2017, IEEE Signal Processing Letters.
[43] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[44] Mao-shen Jia,et al. A harmonic bandwidth extension based on Gaussian mixture model , 2010, IEEE 10th INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING PROCEEDINGS.
[45] Paavo Alku,et al. Speech bandwidth extension using Gaussian mixture model-based estimation of the highband mel spectrum , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[46] Marek Domanski,et al. Improved coding of tonal components in MPEG-4 AAC with SBR , 2008, 2008 16th European Signal Processing Conference.
[47] Chin-Hui Lee,et al. A deep neural network approach to speech bandwidth expansion , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[48] DeLiang Wang,et al. Time and frequency domain long short-term memory for noise robust pitch tracking , 2017, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[49] Jae Lim,et al. Signal estimation from modified short-time Fourier transform , 1984 .
[50] Paavo Alku,et al. Bandwidth Extension of Telephone Speech to Low Frequencies Using Sinusoidal Synthesis and a Gaussian Mixture Model , 2012, IEEE Transactions on Audio, Speech, and Language Processing.
[51] Chi-Min Liu,et al. Compression Artifacts in Perceptual Audio Coding , 2006, IEEE Transactions on Audio, Speech, and Language Processing.
[52] Aren Jansen,et al. CNN architectures for large-scale audio classification , 2016, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[53] Stefano Ermon,et al. Audio Super Resolution using Neural Networks , 2017, ICLR.
[54] Björn W. Schuller,et al. Autoencoder-based Unsupervised Domain Adaptation for Speech Emotion Recognition , 2014, IEEE Signal Processing Letters.
[55] Chih-Wei Wu,et al. Blind bandwidth extension using K-means and Support Vector Regression , 2017, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[56] Juhan Nam,et al. Sample-level Deep Convolutional Neural Networks for Music Auto-tagging Using Raw Waveforms , 2017, ArXiv.
[57] Jinwon Lee,et al. A Fully Convolutional Neural Network for Speech Enhancement , 2016, INTERSPEECH.
[58] Sascha Disch,et al. A continuous modulated single sideband bandwidth extension , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.
[59] Heiko Purnhagen,et al. A Closer Look into MPEG-4 High Efficiency AAC , 2003 .