On Using Backpropagation for Speech Texture Generation and Voice Conversion
暂无分享,去创建一个
Samy Bengio | Ron J. Weiss | Rif A. Saurous | Jan Chorowski | Samy Bengio | J. Chorowski | R. Saurous
[1] Béla Julesz,et al. Visual Pattern Discrimination , 1962, IRE Trans. Inf. Theory.
[2] Martín Abadi,et al. TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems , 2016, ArXiv.
[3] Leon A. Gatys,et al. Texture Synthesis Using Convolutional Neural Networks , 2015, NIPS.
[4] Lonce L. Wyse,et al. Audio Spectrogram Representations for Processing with Convolutional Neural Networks , 2017, ArXiv.
[5] Eero P. Simoncelli,et al. A Parametric Texture Model Based on Joint Statistics of Complex Wavelet Coefficients , 2000, International Journal of Computer Vision.
[6] Jonathon Shlens,et al. A Learned Representation For Artistic Style , 2016, ICLR.
[7] Quoc V. Le,et al. Sequence to Sequence Learning with Neural Networks , 2014, NIPS.
[8] Hideki Kawahara,et al. Restructuring speech representations using a pitch-adaptive time-frequency smoothing and an instantaneous-frequency-based F0 extraction: Possible role of a repetitive structure in sounds , 1999, Speech Commun..
[9] Ying Zhang,et al. Towards End-to-End Speech Recognition with Deep Convolutional Neural Networks , 2016, INTERSPEECH.
[10] Navdeep Jaitly,et al. Towards End-To-End Speech Recognition with Recurrent Neural Networks , 2014, ICML.
[11] Andrew Zisserman,et al. Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps , 2013, ICLR.
[12] Tomoki Toda,et al. The Voice Conversion Challenge 2016 , 2016, INTERSPEECH.
[13] Jae Lim,et al. Signal estimation from modified short-time Fourier transform , 1984 .
[14] Yan Wang,et al. A Powerful Generative Model Using Random Weights for the Deep Image Representation , 2016, NIPS.
[15] Xiaoou Tang,et al. Learning a Deep Convolutional Network for Image Super-Resolution , 2014, ECCV.
[16] Heiga Zen,et al. Fast, Compact, and High Quality LSTM-RNN Based Statistical Parametric Speech Synthesizers for Mobile Devices , 2016, INTERSPEECH.
[17] Li Fei-Fei,et al. Perceptual Losses for Real-Time Style Transfer and Super-Resolution , 2016, ECCV.
[18] Hervé Bredin,et al. TristouNet: Triplet loss for speaker turn embedding , 2016, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[19] Andrea Vedaldi,et al. Texture Networks: Feed-forward Synthesis of Textures and Stylized Images , 2016, ICML.
[20] Jürgen Schmidhuber,et al. Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks , 2006, ICML.
[21] Soumith Chintala,et al. Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks , 2015, ICLR.
[22] Aren Jansen,et al. CNN architectures for large-scale audio classification , 2016, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[23] Yoshua Bengio,et al. Attention-Based Models for Speech Recognition , 2015, NIPS.
[24] Samy Bengio,et al. Tacotron: A Fully End-to-End Text-To-Speech Synthesis Model , 2017, ArXiv.
[25] Heiga Zen,et al. WaveNet: A Generative Model for Raw Audio , 2016, SSW.
[26] Daniel Povey,et al. The Kaldi Speech Recognition Toolkit , 2011 .
[27] Yoshua Bengio,et al. Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.
[28] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.
[29] Eero P. Simoncelli,et al. Sound texture synthesis via filter statistics , 2009, 2009 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics.
[30] Jorge Nocedal,et al. Algorithm 778: L-BFGS-B: Fortran subroutines for large-scale bound-constrained optimization , 1997, TOMS.
[31] Jonathon Shlens,et al. Explaining and Harnessing Adversarial Examples , 2014, ICLR.
[32] Leon A. Gatys,et al. A Neural Algorithm of Artistic Style , 2015, ArXiv.
[33] Tara N. Sainath,et al. Learning the speech front-end with raw waveform CLDNNs , 2015, INTERSPEECH.
[34] George Kurian,et al. Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation , 2016, ArXiv.
[35] Eero P. Simoncelli,et al. Article Sound Texture Perception via Statistics of the Auditory Periphery: Evidence from Sound Synthesis , 2022 .
[36] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[37] Adam Roberts,et al. Audio Deepdream: Optimizing raw audio with convolutional networks , 2016 .