暂无分享,去创建一个
[1] Junichi Yamagishi,et al. A deep auto-encoder based low-dimensional feature extraction from FFT spectral envelopes for statistical parametric speech synthesis , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[2] Sanjiv Kumar,et al. On the Convergence of Adam and Beyond , 2018 .
[3] Lauri Juvela,et al. Deep neural network based trainable voice source model for synthesis of speech with varying vocal effort , 2014, INTERSPEECH.
[4] Antonio Torralba,et al. SoundNet: Learning Sound Representations from Unlabeled Video , 2016, NIPS.
[5] Nicolas Usunier,et al. End-to-End Speech Recognition From the Raw Waveform , 2018, INTERSPEECH.
[6] E. Owens,et al. An Introduction to the Psychology of Hearing , 1997 .
[7] Heiga Zen,et al. WaveNet: A Generative Model for Raw Audio , 2016, SSW.
[8] Jan Skoglund,et al. LPCNET: Improving Neural Speech Synthesis through Linear Prediction , 2018, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[9] Bhuvana Ramabhadran,et al. An autoencoder neural-network based low-dimensionality approach to excitation modeling for HMM-based text-to-speech , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.
[10] Ryan Prenger,et al. Waveglow: A Flow-based Generative Network for Speech Synthesis , 2018, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[11] Alan W. Black,et al. The CMU Arctic speech databases , 2004, SSW.
[12] Koray Kavukcuoglu,et al. Pixel Recurrent Neural Networks , 2016, ICML.
[13] Adam Finkelstein,et al. Fftnet: A Real-Time Speaker-Dependent Neural Vocoder , 2018, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[14] Yannis Agiomyrgiannakis,et al. Vocaine the vocoder and applications in speech synthesis , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[15] Kirill Sakhnov,et al. Approach for Energy-Based Voice Detector with Adaptive Scaling Factor , 2009 .
[16] Hideki Kawahara,et al. STRAIGHT, exploitation of the other aspect of VOCODER: Perceptually isomorphic decomposition of speech sounds , 2006 .
[17] Yannis Stylianou,et al. Harmonic plus noise models for speech, combined with statistical methods, for speech and speaker modification , 1996 .
[18] Per Hedelin. A tone oriented voice excited vocoder , 1981, ICASSP.
[19] Masanori Morise,et al. WORLD: A Vocoder-Based High-Quality Speech Synthesis System for Real-Time Applications , 2016, IEICE Trans. Inf. Syst..
[20] Yoshua Bengio,et al. Speech and Speaker Recognition from Raw Waveform with SincNet , 2018, ArXiv.
[21] Thomas F. Quatieri,et al. Speech analysis/Synthesis based on a sinusoidal representation , 1986, IEEE Trans. Acoust. Speech Signal Process..
[22] Yu Tsao,et al. Raw waveform-based speech enhancement by fully convolutional networks , 2017, 2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC).