暂无分享,去创建一个
Heiga Zen | Alex Graves | Oriol Vinyals | Karen Simonyan | Koray Kavukcuoglu | Sander Dieleman | Aäron van den Oord | Nal Kalchbrenner | Andrew W. Senior | Oriol Vinyals | K. Kavukcuoglu | A. Senior | A. Graves | K. Simonyan | S. Dieleman | Nal Kalchbrenner | H. Zen | Alex Graves | O. Vinyals
[1] T. Chiba. The vowel, its nature and structure , 1958 .
[2] Gunnar Fant,et al. Acoustic Theory Of Speech Production , 1960 .
[3] F. Itakura,et al. A statistical method for estimation of speech spectral density and formant frequencies , 1970 .
[4] F. Itakura. Line spectrum representation of linear predictor coefficients of speech signals , 1975 .
[5] A. B. Poritz,et al. Linear predictive hidden Markov models and the speech signal , 1982, ICASSP.
[6] Biing-Hwang Juang,et al. Mixture autoregressive hidden Markov models for speech signals , 1985, IEEE Trans. Acoust. Speech Signal Process..
[7] Eric Moulines,et al. Pitch-synchronous waveform processing techniques for text-to-speech synthesis using diphones , 1989, Speech Commun..
[8] Richard Kronland-Martinet,et al. A real-time algorithm for signal analysis with the help of the wavelet transform , 1989 .
[9] P. Dutilleux. An Implementation of the “algorithme à trous” to Compute the Wavelet Transform , 1989 .
[10] Ph. Tchamitchian,et al. Wavelets: Time-Frequency Methods and Phase Space , 1992 .
[11] Yoshinori Sagisaka,et al. ATR μ-talk speech synthesis system , 1992, ICSLP.
[12] Tony Robinson,et al. Speech synthesis using artificial neural networks trained on cepstral coefficients , 1993, EUROSPEECH.
[13] Jonathan G. Fiscus,et al. DARPA TIMIT:: acoustic-phonetic continuous speech corpus CD-ROM, NIST speech disc 1-1.1 , 1993 .
[14] Biing-Hwang Juang,et al. Fundamentals of speech recognition , 1993, Prentice Hall signal processing series.
[15] S. Srihari. Mixture Density Networks , 1994 .
[16] Alan W. Black,et al. Unit selection in a concatenative speech synthesis system using a large speech database , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.
[17] Noel Massey,et al. Text-to-speech conversion with neural networks: a recurrent TDNN approach , 1998, EUROSPEECH.
[18] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.
[19] Hideki Kawahara,et al. Restructuring speech representations using a pitch-adaptive time-frequency smoothing and an instantaneous-frequency-based F0 extraction: Possible role of a repetitive structure in sounds , 1999, Speech Commun..
[20] Hideki Kawahara,et al. Aperiodicity extraction and control using mixed mode excitation and group delay manipulation for a high quality speech analysis, modification and synthesis system STRAIGHT , 2001, MAVEBA.
[21] S. Peltonen,et al. Nonlinear filter design: methodologies and challenges , 2001, ISPA 2001. Proceedings of the 2nd International Symposium on Image and Signal Processing and Analysis. In conjunction with 23rd International Conference on Information Technology Interfaces (IEEE Cat..
[22] 吉村 貴克,et al. Simultaneous modeling of phonetic and prosodic parameters,and characteristic conversion for HMM-based text-to-speech systems , 2002 .
[23] 全 炳河,et al. Reformulating HMM as a trajectory model by imposing explicit relationships between static and dynamic features , 2006 .
[24] Heiga Zen,et al. Statistical Parametric Speech Synthesis , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.
[25] Keiichi Tokuda,et al. A Speech Parameter Generation Algorithm Considering Global Variance for HMM-Based Speech Synthesis , 2007, IEICE Trans. Inf. Syst..
[26] Heiga Zen,et al. Reformulating the HMM as a trajectory model by imposing explicit relationships between static and dynamic feature vector sequences , 2007, Comput. Speech Lang..
[27] Keiichi Tokuda,et al. Minimum generation error training with direct log spectral distortion on LSPs for HMM-based speech synthesis , 2008, INTERSPEECH.
[28] Keiichi Tokuda,et al. Statistical approach to vocal tract transfer function estimation based on factor analyzed trajectory HMM , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.
[29] Edith Law,et al. Input-agreement: a new mechanism for collecting data using human computation games , 2009, CHI.
[30] Heiga Zen,et al. Statistical parametric speech synthesis with joint estimation of acoustic and excitation model parameters , 2010, SSW.
[31] Geoffrey E. Hinton,et al. Rectified Linear Units Improve Restricted Boltzmann Machines , 2010, ICML.
[32] Dimitri Palaz,et al. Estimating phoneme class conditional probabilities from raw speech signal using convolutional neural networks , 2013, INTERSPEECH.
[33] Heiga Zen,et al. Statistical parametric speech synthesis using deep neural networks , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.
[34] Alan W. Black,et al. A Deep Learning Approach to Data-driven Parameterizations for Statistical Parametric Speech Synthesis , 2014, ArXiv.
[35] Yoshihiko Nankaku,et al. Integration of Spectral Feature Extraction and Modeling for HMM-Based Speech Synthesis , 2014, IEICE Trans. Inf. Syst..
[36] Frank K. Soong,et al. TTS synthesis with bidirectional LSTM based recurrent neural networks , 2014, INTERSPEECH.
[37] Hermann Ney,et al. Acoustic modeling with deep neural networks using raw time signal for LVCSR , 2014, INTERSPEECH.
[38] Yannis Agiomyrgiannakis,et al. Vocaine the vocoder and applications in speech synthesis , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[39] Iasonas Kokkinos,et al. Semantic Image Segmentation with Deep Convolutional Nets and Fully Connected CRFs , 2014, ICLR.
[40] Ron J. Weiss,et al. Speech acoustic modeling from raw multichannel waveforms , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[41] Cassia Valentini-Botinhao,et al. Modelling acoustic feature dependencies with artificial neural networks: Trajectory-RNADE , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[42] Heiga Zen,et al. Directly modeling speech waveforms by neural networks for statistical parametric speech synthesis , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[43] Matthias Bethge,et al. Generative Image Modeling Using Spatial LSTMs , 2015, NIPS.
[44] Tara N. Sainath,et al. Learning the speech front-end with raw waveform CLDNNs , 2015, INTERSPEECH.
[45] Alex Graves,et al. Conditional Image Generation with PixelCNN Decoders , 2016, NIPS.
[46] Heiga Zen,et al. Fast, Compact, and High Quality LSTM-RNN Based Statistical Parametric Speech Synthesizers for Mobile Devices , 2016, INTERSPEECH.
[47] Alexander Gutkin,et al. Recent Advances in Google Real-Time HMM-Driven Unit Selection Synthesizer , 2016, INTERSPEECH.
[48] Junichi Yamagishi,et al. A deep auto-encoder based low-dimensional feature extraction from FFT spectral envelopes for statistical parametric speech synthesis , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[49] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[50] Yonghui Wu,et al. Exploring the Limits of Language Modeling , 2016, ArXiv.
[51] Koray Kavukcuoglu,et al. Pixel Recurrent Neural Networks , 2016, ICML.
[52] Tomoki Toda,et al. Postfilters to Modify the Modulation Spectrum for Statistical Parametric Speech Synthesis , 2016, IEEE/ACM Transactions on Audio, Speech, and Language Processing.
[53] Heiga Zen,et al. Directly modeling voiced and unvoiced components in speech waveforms by neural networks , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[54] Vladlen Koltun,et al. Multi-Scale Context Aggregation by Dilated Convolutions , 2015, ICLR.
[55] Masanori Morise,et al. WORLD: A Vocoder-Based High-Quality Speech Synthesis System for Real-Time Applications , 2016, IEICE Trans. Inf. Syst..
[56] Junichi Yamagishi,et al. SUPERSEDED - CSTR VCTK Corpus: English Multi-speaker Corpus for CSTR Voice Cloning Toolkit , 2016 .