Deep Learning for Audio Signal Processing
暂无分享,去创建一个
Tara N. Sainath | Bo Li | Hendrik Purwins | Tuomas Virtanen | Shuo-Yiin Chang | Jan Schlüter | Tara Sainath | T. Virtanen | Shuo-yiin Chang | H. Purwins | Bo Li | Jan Schlüter | Hendrik Purwins
[1] Meinard Müller,et al. A Single-Step Approach to Musical Tempo Estimation Using a Convolutional Neural Network , 2018, ISMIR.
[2] Geoffrey Zweig,et al. Advances in all-neural speech recognition , 2016, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[3] Hermann Ney,et al. Acoustic modeling with deep neural networks using raw time signal for LVCSR , 2014, INTERSPEECH.
[4] Geoffrey Zweig,et al. LSTM time and frequency recurrence for automatic speech recognition , 2015, 2015 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU).
[5] Arun Narayanan,et al. Toward Domain-Invariant Speech Recognition via Large Scale Training , 2018, 2018 IEEE Spoken Language Technology Workshop (SLT).
[6] Klaus Obermayer,et al. A new method for tracking modulations in tonal music in audio data format , 2000, Proceedings of the IEEE-INNS-ENNS International Joint Conference on Neural Networks. IJCNN 2000. Neural Computing: New Challenges and Perspectives for the New Millennium.
[7] Jorge Calvo-Zaragoza,et al. An End-to-end Framework for Audio-to-Score Music Transcription on Monophonic Excerpts , 2018, ISMIR.
[8] Thomas Grill,et al. Exploring Data Augmentation for Improved Singing Voice Detection with Neural Networks , 2015, ISMIR.
[9] Alex Graves,et al. Sequence Transduction with Recurrent Neural Networks , 2012, ArXiv.
[10] DeLiang Wang,et al. Supervised Speech Separation Based on Deep Learning: An Overview , 2017, IEEE/ACM Transactions on Audio, Speech, and Language Processing.
[11] Jürgen Schmidhuber,et al. Multi-dimensional Recurrent Neural Networks , 2007, ICANN.
[12] Juhan Nam,et al. Sample-level Deep Convolutional Neural Networks for Music Auto-tagging Using Raw Waveforms , 2017, ArXiv.
[13] Liang Lu,et al. On training the recurrent neural network encoder-decoder for large vocabulary end-to-end speech recognition , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[14] Justin Salamon,et al. Deep Salience Representations for F0 Estimation in Polyphonic Music , 2017, ISMIR.
[15] Georg Heigold,et al. Sequence discriminative distributed training of long short-term memory recurrent neural networks , 2014, INTERSPEECH.
[16] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.
[17] Naoyuki Kanda,et al. Elastic spectral distortion for low resource speech recognition with deep neural networks , 2013, 2013 IEEE Workshop on Automatic Speech Recognition and Understanding.
[18] Léon Bottou,et al. Wasserstein Generative Adversarial Networks , 2017, ICML.
[19] Allen Huang,et al. Deep Learning for Music , 2016, ArXiv.
[20] Tara N. Sainath,et al. State-of-the-Art Speech Recognition with Sequence-to-Sequence Models , 2017, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[21] Andrew W. Senior,et al. Long short-term memory recurrent neural network architectures for large scale acoustic modeling , 2014, INTERSPEECH.
[22] Navdeep Jaitly,et al. Towards End-To-End Speech Recognition with Recurrent Neural Networks , 2014, ICML.
[23] Rémi Gribonval,et al. Performance measurement in blind audio source separation , 2006, IEEE Transactions on Audio, Speech, and Language Processing.
[24] Geoffrey E. Hinton,et al. Understanding how Deep Belief Networks perform acoustic modelling , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[25] Tara N. Sainath,et al. Bytes Are All You Need: End-to-end Multilingual Speech Recognition and Synthesis with Bytes , 2018, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[26] Heikki Huttunen,et al. Convolutional Recurrent Neural Networks for Polyphonic Sound Event Detection , 2017, IEEE/ACM Transactions on Audio, Speech, and Language Processing.
[27] Luke S. Zettlemoyer,et al. Deep Contextualized Word Representations , 2018, NAACL.
[28] Juan Pablo Bello,et al. A Software Framework for Musical Data Augmentation , 2015, ISMIR.
[29] Alfred O. Hero,et al. Complex input convolutional neural networks for wide angle SAR ATR , 2016, 2016 IEEE Global Conference on Signal and Information Processing (GlobalSIP).
[30] Nima Mesgarani,et al. Deep attractor network for single-microphone speaker separation , 2016, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[31] Liang Lu,et al. Deep beamforming networks for multi-channel speech recognition , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[32] Emmanuel Vincent,et al. Multichannel Audio Source Separation With Deep Neural Networks , 2016, IEEE/ACM Transactions on Audio, Speech, and Language Processing.
[33] David A. Patterson,et al. In-datacenter performance analysis of a tensor processing unit , 2017, 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA).
[34] Iasonas Kokkinos,et al. DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[35] Florian Krebs,et al. Joint Beat and Downbeat Tracking with Recurrent Neural Networks , 2016, ISMIR.
[36] Jesper Jensen,et al. Monaural Speech Enhancement Using Deep Neural Networks by Maximizing a Short-Time Objective Intelligibility Measure , 2018, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[37] Geoffrey E. Hinton,et al. Deep Learning , 2015, Nature.
[38] Yu Tsao,et al. SNR-Aware Convolutional Neural Network Modeling for Speech Enhancement , 2016, INTERSPEECH.
[39] Quoc V. Le,et al. Listen, Attend and Spell , 2015, ArXiv.
[40] Marco Cuturi,et al. Soft-DTW: a Differentiable Loss Function for Time-Series , 2017, ICML.
[41] Hemant A. Patil,et al. Novel Unsupervised Auditory Filterbank Learning Using Convolutional RBM for Speech Recognition , 2016, IEEE/ACM Transactions on Audio, Speech, and Language Processing.
[42] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.
[43] Hagen Soltau,et al. Neural Speech Recognizer: Acoustic-to-Word LSTM Model for Large Vocabulary Speech Recognition , 2016, INTERSPEECH.
[44] Gaëtan Hadjeres,et al. Deep Learning Techniques for Music Generation - A Survey , 2017, ArXiv.
[45] DeLiang Wang,et al. A New Framework for Supervised Speech Enhancement in the Time Domain , 2018, INTERSPEECH.
[46] Adam Lopez,et al. Towards speech-to-text translation without speech recognition , 2017, EACL.
[47] Mark B. Sandler,et al. Automatic Tagging Using Deep Convolutional Neural Networks , 2016, ISMIR.
[48] Archontis Politis,et al. Direction of Arrival Estimation for Multiple Sound Sources Using Convolutional Recurrent Neural Network , 2017, 2018 26th European Signal Processing Conference (EUSIPCO).
[49] Jun Du,et al. An Experimental Study on Speech Enhancement Based on Deep Neural Networks , 2014, IEEE Signal Processing Letters.
[50] Geoffrey Zweig,et al. Exploring multidimensional lstms for large vocabulary ASR , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[51] Tatsuya Harada,et al. Learning from Between-class Examples for Deep Sound Recognition , 2017, ICLR.
[52] Sandeep Subramanian,et al. Deep Complex Networks , 2017, ICLR.
[53] Paris Smaragdis,et al. Joint Optimization of Masks and Deep Recurrent Neural Networks for Monaural Source Separation , 2015, IEEE/ACM Transactions on Audio, Speech, and Language Processing.
[54] Tara N. Sainath,et al. Generation of Large-Scale Simulated Utterances in Virtual Rooms to Train Deep-Neural Networks for Far-Field Speech Recognition in Google Home , 2017, INTERSPEECH.
[55] 渡邊 澄夫. Algebraic geometry and statistical learning theory , 2009 .
[56] Yoshua Bengio,et al. Attention-Based Models for Speech Recognition , 2015, NIPS.
[57] F ROSENBLATT,et al. The perceptron: a probabilistic model for information storage and organization in the brain. , 1958, Psychological review.
[58] Hervé Bourlard,et al. Connectionist Speech Recognition: A Hybrid Approach , 1993 .
[59] Tara N. Sainath,et al. Modeling Time-Frequency Patterns with LSTM vs. Convolutional Architectures for LVCSR Tasks , 2016, INTERSPEECH.
[60] Tuomas Virtanen,et al. Filterbank learning for deep neural network based polyphonic sound event detection , 2016, 2016 International Joint Conference on Neural Networks (IJCNN).
[61] Zachary Chase Lipton. A Critical Review of Recurrent Neural Networks for Sequence Learning , 2015, ArXiv.
[62] Khe Chai Sim,et al. A Spectral Masking Approach to Noise-Robust Speech Recognition Using Deep Neural Networks , 2014, IEEE/ACM Transactions on Audio, Speech, and Language Processing.
[63] Tara N. Sainath,et al. Endpoint Detection Using Grid Long Short-Term Memory Networks for Streaming Speech Recognition , 2017, INTERSPEECH.
[64] Gaël Richard,et al. Robust Downbeat Tracking Using an Ensemble of Convolutional Networks , 2016, IEEE/ACM Transactions on Audio, Speech, and Language Processing.
[65] Erich Elsen,et al. Efficient Neural Audio Synthesis , 2018, ICML.
[66] Thomas Grill,et al. Boundary Detection in Music Structure Analysis using Convolutional Neural Networks , 2014, ISMIR.
[67] Tara N. Sainath,et al. Improvements to Deep Convolutional Neural Networks for LVCSR , 2013, 2013 IEEE Workshop on Automatic Speech Recognition and Understanding.
[68] M. Picheny,et al. Comparison of Parametric Representation for Monosyllabic Word Recognition in Continuously Spoken Sentences , 2017 .
[69] Jan Schlüter,et al. Learning to Pinpoint Singing Voice from Weakly Labeled Examples , 2016, ISMIR.
[70] Jacob Benesty,et al. Fundamentals of Noise Reduction , 2008 .
[71] Tara N. Sainath,et al. Convolutional, Long Short-Term Memory, fully connected Deep Neural Networks , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[72] Navdeep Jaitly,et al. Vocal Tract Length Perturbation (VTLP) improves speech recognition , 2013 .
[73] Kevin Skadron,et al. A performance study of general-purpose applications on graphics processors using CUDA , 2008, J. Parallel Distributed Comput..
[74] Slim Essid,et al. Analysis of Common Design Choices in Deep Learning Systems for Downbeat Tracking , 2018, ISMIR.
[75] Yoshua Bengio,et al. Speaker Recognition from Raw Waveform with SincNet , 2018, 2018 IEEE Spoken Language Technology Workshop (SLT).
[76] Zaïd Harchaoui,et al. Learning Features of Music from Scratch , 2016, ICLR.
[77] Geoffrey E. Hinton,et al. Deep Belief Networks for phone recognition , 2009 .
[78] Gerhard Widmer,et al. Feature Learning for Chord Recognition: The Deep Chroma Extractor , 2016, ISMIR.
[79] Tara N. Sainath,et al. Temporal Modeling Using Dilated Convolution and Gating for Voice-Activity-Detection , 2018, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[80] Benjamin Schrauwen,et al. End-to-end learning for music audio , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[81] Douglas Eck,et al. A Supervised Classification Algorithm for Note Onset Detection , 2006, EURASIP J. Adv. Signal Process..
[82] DeLiang Wang,et al. A Feature Study for Classification-Based Speech Separation at Low Signal-to-Noise Ratios , 2014, IEEE/ACM Transactions on Audio, Speech, and Language Processing.
[83] Jae S. Lim,et al. Signal estimation from modified short-time Fourier transform , 1983, ICASSP.
[84] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.
[85] Tara N. Sainath,et al. Learning filter banks within a deep neural network framework , 2013, 2013 IEEE Workshop on Automatic Speech Recognition and Understanding.
[86] Thomas Brox,et al. U-Net: Convolutional Networks for Biomedical Image Segmentation , 2015, MICCAI.
[87] Geoffrey E. Hinton,et al. Learning a better representation of speech soundwaves using restricted boltzmann machines , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[88] DeLiang Wang,et al. On Training Targets for Supervised Speech Separation , 2014, IEEE/ACM Transactions on Audio, Speech, and Language Processing.
[89] Vaibhava Goel,et al. Dense Prediction on Sequences with Time-Dilated Convolutions for Speech Recognition , 2016, ArXiv.
[90] Cem Anil,et al. TimbreTron: A WaveNet(CycleGAN(CQT(Audio))) Pipeline for Musical Timbre Transfer , 2018, ICLR.
[91] Navdeep Jaitly,et al. Natural TTS Synthesis by Conditioning Wavenet on MEL Spectrogram Predictions , 2017, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[92] Lawrence R. Rabiner,et al. Automatic Speech Recognition - A Brief History of the Technology Development , 2004 .
[93] Tatsuya Kawahara,et al. Cross-domain speech recognition using nonparallel corpora with cycle-consistent adversarial networks , 2017, 2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU).
[94] Annamaria Mesaros,et al. Metrics for Polyphonic Sound Event Detection , 2016 .
[95] Ron J. Weiss,et al. Speech acoustic modeling from raw multichannel waveforms , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[96] Daniel P. W. Ellis,et al. Better beat tracking through robust onset aggregation , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[97] Samy Bengio,et al. Tacotron: Towards End-to-End Speech Synthesis , 2017, INTERSPEECH.
[98] S. Furui,et al. Speaker-independent isolated word recognition based on emphasized spectral dynamics , 1986, ICASSP '86. IEEE International Conference on Acoustics, Speech, and Signal Processing.
[99] Sanjeev Khudanpur,et al. Audio augmentation for speech recognition , 2015, INTERSPEECH.
[100] Mark J. F. Gales,et al. Improving the interpretability of deep neural networks with stimulated learning , 2015, 2015 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU).
[101] Vladlen Koltun,et al. Multi-Scale Context Aggregation by Dilated Convolutions , 2015, ICLR.
[102] Bo Chen,et al. High-quality Voice Conversion Using Spectrogram-Based WaveNet Vocoder , 2018, INTERSPEECH.
[103] Heiga Zen,et al. WaveNet: A Generative Model for Raw Audio , 2016, SSW.
[104] Geoffrey E. Hinton,et al. Phoneme recognition using time-delay neural networks , 1989, IEEE Trans. Acoust. Speech Signal Process..
[105] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[106] Sanjeev Khudanpur,et al. Acoustic Modelling from the Signal Domain Using CNNs , 2016, INTERSPEECH.
[107] Sebastian Böck,et al. Improved musical onset detection with Convolutional Neural Networks , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[108] Thierry Bertin-Mahieux,et al. The Million Song Dataset , 2011, ISMIR.
[109] Heiga Zen,et al. Parallel WaveNet: Fast High-Fidelity Speech Synthesis , 2017, ICML.
[110] Tara N. Sainath,et al. FUNDAMENTAL TECHNOLOGIES IN MODERN SPEECH RECOGNITION Digital Object Identifier 10.1109/MSP.2012.2205597 , 2012 .
[111] Stefan B. Williams,et al. Sound Source Localization in a Multipath Environment Using Convolutional Neural Networks , 2017, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[112] Heikki Huttunen,et al. Recurrent neural networks for polyphonic sound event detection in real life recordings , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[113] DeLiang Wang,et al. Ideal ratio mask estimation using deep neural networks for robust speech recognition , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.
[114] Sander Dieleman,et al. Piano Genie , 2018, IUI.
[115] Karen Simonyan,et al. Neural Audio Synthesis of Musical Notes with WaveNet Autoencoders , 2017, ICML.
[116] Dimitri Palaz,et al. Estimating phoneme class conditional probabilities from raw speech signal using convolutional neural networks , 2013, INTERSPEECH.
[117] Ph. Tchamitchian,et al. Wavelets: Time-Frequency Methods and Phase Space , 1992 .
[118] Juan Pablo Bello,et al. Structured Training for Large-Vocabulary Chord Recognition , 2017, ISMIR.
[119] Steve Renals,et al. THE USE OF RECURRENT NEURAL NETWORKS IN CONTINUOUS SPEECH RECOGNITION , 1996 .
[120] Yong Xu,et al. Iterative Deep Neural Networks for Speaker-Independent Binaural Blind Speech Separation , 2018, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[121] Paris Smaragdis,et al. Generative Adversarial Source Separation , 2017, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[122] Bob L. Sturm,et al. Local Interpretable Model-Agnostic Explanations for Music Content Analysis , 2017, ISMIR.
[123] Yu Tsao,et al. End-to-End Waveform Utterance Enhancement for Direct Evaluation Metrics Optimization by Fully Convolutional Neural Networks , 2017, IEEE/ACM Transactions on Audio, Speech, and Language Processing.
[124] Chris Donahue,et al. Synthesizing Audio with Generative Adversarial Networks , 2018, ArXiv.
[125] Tara N. Sainath,et al. Learning the speech front-end with raw waveform CLDNNs , 2015, INTERSPEECH.
[126] Vincent Lostanlen,et al. Deep Convolutional Networks on the Pitch Spiral For Music Instrument Recognition , 2016, ISMIR.
[127] Jonathan Le Roux,et al. Single-Channel Multi-Speaker Separation Using Deep Clustering , 2016, INTERSPEECH.
[128] James R. Glass,et al. Speech feature denoising and dereverberation via deep autoencoders for noisy reverberant speech recognition , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[129] Chris Donahue,et al. Exploring Speech Enhancement with Generative Adversarial Networks for Robust Speech Recognition , 2017, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[130] Björn W. Schuller,et al. Universal Onset Detection with Bidirectional Long Short-Term Memory Neural Networks , 2010, ISMIR.
[131] Jürgen Schmidhuber,et al. Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks , 2006, ICML.
[132] Zheng-Hua Tan,et al. Conditional Generative Adversarial Networks for Speech Enhancement and Noise-Robust Speaker Verification , 2017, INTERSPEECH.
[133] Yoshua Bengio,et al. SampleRNN: An Unconditional End-to-End Neural Audio Generation Model , 2016, ICLR.
[134] Joaquín González-Rodríguez,et al. Automatic language identification using deep neural networks , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[135] Antonio Bonafonte,et al. SEGAN: Speech Enhancement Generative Adversarial Network , 2017, INTERSPEECH.
[136] Yoshua Bengio,et al. Generative Adversarial Nets , 2014, NIPS.
[137] Emanuel A. P. Habets,et al. Multi-Speaker Localization Using Convolutional Neural Network Trained with Noise , 2017, ArXiv.
[138] Yoshua Bengio,et al. Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.
[139] Yu Zhang,et al. Very deep convolutional networks for end-to-end speech recognition , 2016, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[140] Francesco Piazza,et al. A neural network based algorithm for speaker localization in a multi-room environment , 2016, 2016 IEEE 26th International Workshop on Machine Learning for Signal Processing (MLSP).
[141] Lawrence D. Jackel,et al. Backpropagation Applied to Handwritten Zip Code Recognition , 1989, Neural Computation.
[142] Jeffrey L. Elman,et al. Finding Structure in Time , 1990, Cogn. Sci..
[143] Björn W. Schuller,et al. Speech Enhancement with LSTM Recurrent Neural Networks and its Application to Noise-Robust ASR , 2015, LVA/ICA.
[144] Li-Rong Dai,et al. A Regression Approach to Speech Enhancement Based on Deep Neural Networks , 2015, IEEE/ACM Transactions on Audio, Speech, and Language Processing.
[145] Geoffrey E. Hinton,et al. Learning representations by back-propagating errors , 1986, Nature.
[146] Mathieu Lagrange,et al. Detection and Classification of Acoustic Scenes and Events: Outcome of the DCASE 2016 Challenge , 2018, IEEE/ACM Transactions on Audio, Speech, and Language Processing.
[147] Tara N. Sainath,et al. Neural Network Adaptive Beamforming for Robust Multichannel Speech Recognition , 2016, INTERSPEECH.
[148] Yu Tsao,et al. Speech enhancement based on deep denoising autoencoder , 2013, INTERSPEECH.
[149] Tara N. Sainath,et al. A Comparison of Sequence-to-Sequence Models for Speech Recognition , 2017, INTERSPEECH.
[150] Juan Pablo Bello,et al. Rethinking Automatic Chord Recognition with Convolutional Neural Networks , 2012, 2012 11th International Conference on Machine Learning and Applications.
[151] DeLiang Wang,et al. Complex Ratio Masking for Monaural Speech Separation , 2016, IEEE/ACM Transactions on Audio, Speech, and Language Processing.