VoCo
暂无分享,去创建一个
Gautham J. Mysore | Zeyu Jin | Adam Finkelstein | Stephen DiVerdi | Jingwan Lu | Jingwan Lu | A. Finkelstein | Zeyu Jin | G. Mysore | S. DiVerdi | Adam Finkelstein | G. J. Mysore
[1] Hyung Soon Kim,et al. Narrowband to wideband conversion of speech using GMM based transformation , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).
[2] Xin Wang,et al. An RNN-Based Quantized F0 Model with Multi-Tier Feedback Links for Text-to-Speech Synthesis , 2017, INTERSPEECH.
[3] Li-Rong Dai,et al. Voice Conversion Using Deep Neural Networks With Layer-Wise Generative Training , 2014, IEEE/ACM Transactions on Audio, Speech, and Language Processing.
[4] LingZhen-Hua,et al. Voice conversion using deep neural networks with layer-wise generative training , 2014 .
[5] Sercan Ömer Arik,et al. Deep Voice 2: Multi-Speaker Neural Text-to-Speech , 2017, NIPS.
[6] Kåre Sjölander,et al. An HMM-based system for automatic segmentation and alignment of speech , 2003 .
[7] Jr. G. Forney,et al. Viterbi Algorithm , 1973, Encyclopedia of Machine Learning.
[8] Wilmot Li,et al. Tools for placing cuts and transitions in interview video , 2012, ACM Trans. Graph..
[9] Alan W. Black,et al. The CMU Arctic speech databases , 2004, SSW.
[10] Thierry Dutoit,et al. Towards a Voice Conversion System Based on Frame Selection , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.
[11] Gautham J. Mysore,et al. Equalization matching of speech recordings in real-world environments , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[12] Hao Wang,et al. Phonetic posteriorgrams for many-to-one voice conversion without parallel data training , 2016, 2016 IEEE International Conference on Multimedia and Expo (ICME).
[13] Stephen DiVerdi,et al. HelpingHand: example-based stroke stylization , 2012, ACM Trans. Graph..
[14] Gautham J. Mysore,et al. Fast and easy crowdsourced perceptual audio evaluation , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[15] Heiga Zen,et al. WaveNet: A Generative Model for Raw Audio , 2016, SSW.
[16] S. Imai,et al. Mel Log Spectrum Approximation (MLSA) filter for speech synthesis , 1983 .
[17] Tetsunori Kobayashi,et al. Hybrid Voice Conversion of Unit Selection and Generation Using Prosody Dependent HMM , 2006, IEICE Trans. Inf. Syst..
[18] Eric Moulines,et al. Continuous probabilistic transform for voice conversion , 1998, IEEE Trans. Speech Audio Process..
[19] Jean Charles Bazin,et al. Painting by feature , 2013, ACM Trans. Graph..
[20] Christopher D. Manning,et al. Effective Approaches to Attention-based Neural Machine Translation , 2015, EMNLP.
[21] K.-F. Lee,et al. CMU robust vocabulary-independent speech recognition system , 1991, [Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing.
[22] Keiichi Tokuda,et al. Speech parameter generation algorithms for HMM-based speech synthesis , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).
[23] Paul Boersma,et al. Praat, a system for doing phonetics by computer , 2002 .
[24] Laura A. Dabbish,et al. Simplifying video editing using metadata , 2002, DIS '02.
[25] J. Tukey,et al. An algorithm for the machine calculation of complex Fourier series , 1965 .
[26] Haizhou Li,et al. Exemplar-based unit selection for voice conversion utilizing temporal information , 2013, INTERSPEECH.
[27] I. Elamvazuthi,et al. Voice Recognition Algorithms using Mel Frequency Cepstral Coefficient (MFCC) and Dynamic Time Warping (DTW) Techniques , 2010, ArXiv.
[28] Wilmot Li,et al. Content-based tools for editing audio stories , 2013, UIST.
[29] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[30] R. Kubichek,et al. Mel-cepstral distance measure for objective speech quality assessment , 1993, Proceedings of IEEE Pacific Rim Conference on Communications Computers and Signal Processing.
[31] Keiichi Tokuda,et al. XIMERA: a new TTS from ATR based on corpus-based technologies , 2004, SSW.
[32] Matthew Stone,et al. Speaking with hands: creating animated conversational characters from recordings of human performance , 2004, ACM Trans. Graph..
[33] Tomoki Toda,et al. Modulation spectrum-constrained trajectory training algorithm for GMM-based Voice Conversion , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[34] Daniela Braga,et al. Evaluating Voice Quality and Speech Synthesis Using Crowdsourcing , 2013, TSD.
[35] Stephen DiVerdi,et al. Cute: A concatenative method for voice conversion using exemplar-based unit selection , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[36] Joseph P. Olive,et al. Text-to-speech synthesis , 1995, AT&T Technical Journal.
[37] Stephen Isard,et al. Optimal coupling of diphones , 1994, SSW.
[38] Tetsuya Takiguchi,et al. Voice conversion based on Non-negative matrix factorization using phoneme-categorized dictionary , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[39] Hideki Kawahara,et al. YIN, a fundamental frequency estimator for speech and music. , 2002, The Journal of the Acoustical Society of America.
[40] Tomoki Toda,et al. Voice Conversion Based on Maximum-Likelihood Estimation of Spectral Parameter Trajectory , 2007, IEEE Transactions on Audio, Speech, and Language Processing.
[41] Heiga Zen,et al. Speech Synthesis Based on Hidden Markov Models , 2013, Proceedings of the IEEE.
[42] Zhizheng Wu,et al. Deep neural network-guided unit selection synthesis , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[43] Heiga Zen,et al. Statistical parametric speech synthesis using deep neural networks , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.
[44] Keiichi Tokuda,et al. Mel-generalized cepstral analysis - a unified approach to speech spectral estimation , 1994, ICSLP.
[45] Adam Finkelstein,et al. Fftnet: A Real-Time Speaker-Dependent Neural Vocoder , 2018, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[46] Heiga Zen,et al. Statistical Parametric Speech Synthesis , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.
[47] Paul Taylor,et al. Automatically clustering similar units for unit selection in speech synthesis , 1997, EUROSPEECH.
[48] Alexander Kain,et al. Spectral voice conversion for text-to-speech synthesis , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).
[49] Michael D. Buhrmester,et al. Amazon's Mechanical Turk , 2011, Perspectives on psychological science : a journal of the Association for Psychological Science.
[50] Alan W. Black. Unit selection and emotional speech , 2003, INTERSPEECH.
[51] Kei Fujii,et al. High-Individuality Voice Conversion Based on Concatenative Speech Synthesis , 2007 .
[52] Steve Whittaker,et al. Semantic speech editing , 2004, CHI.
[53] Björn Hartmann,et al. SceneSkim: Searching and Browsing Movies Using Synchronized Captions, Scripts and Plot Summaries , 2015, UIST.
[54] David Salesin,et al. Image Analogies , 2001, SIGGRAPH.
[55] Bhiksha Raj,et al. Non-negative matrix factorization based compensation of music for automatic speech recognition , 2010, INTERSPEECH.
[56] Kishore Prahallad,et al. Voice conversion using Artificial Neural Networks , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.
[57] Navdeep Jaitly,et al. Natural TTS Synthesis by Conditioning Wavenet on MEL Spectrogram Predictions , 2017, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[58] Tomoki Toda,et al. One-to-Many and Many-to-One Voice Conversion Based on Eigenvoices , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.
[59] Keiichi Tokuda,et al. Simultaneous modeling of spectrum, pitch and duration in HMM-based speech synthesis , 1999, EUROSPEECH.
[60] Tomoki Toda,et al. Statistical Voice Conversion with WaveNet-Based Waveform Generation , 2017, INTERSPEECH.
[61] Werner Verhelst,et al. Waveform similarity based overlap-add (WSOLA) for time-scale modification of speech: structures and evaluation , 1993, EUROSPEECH.
[62] S. R. Mahadeva Prasanna,et al. A syllable-based framework for unit selection synthesis in 13 Indian languages , 2013, 2013 International Conference Oriental COCOSDA held jointly with 2013 Conference on Asian Spoken Language Research and Evaluation (O-COCOSDA/CASLRE).
[63] Takao Kobayashi,et al. Analysis of Speaker Adaptation Algorithms for HMM-Based Speech Synthesis and a Constrained SMAPLR Adaptation Algorithm , 2009, IEEE Transactions on Audio, Speech, and Language Processing.
[64] Sergey Levine,et al. Real-time prosody-driven synthesis of body language , 2009, ACM Trans. Graph..
[65] A. F. Machado,et al. VOICE CONVERSION: A CRITICAL SURVEY , 2010 .
[66] Sercan Ömer Arik,et al. Neural Voice Cloning with a Few Samples , 2018, NeurIPS.
[67] Takashi Nose,et al. Speaker and style adaptation using average voice model for style control in HMM-based speech synthesis , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.
[68] Hermann Ney,et al. Text-Independent Voice Conversion Based on Unit Selection , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.
[69] Björn Hartmann,et al. Video digests: a browsable, skimmable format for informational lecture videos , 2014, UIST.
[70] Hideki Kawahara,et al. Tandem-STRAIGHT: A temporally stable power spectral representation for periodic signals and applications to interference-free spectrum, F0, and aperiodicity estimation , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.
[71] Hideki Kawahara,et al. Restructuring speech representations using a pitch-adaptive time-frequency smoothing and an instantaneous-frequency-based F0 extraction: Possible role of a repetitive structure in sounds , 1999, Speech Commun..
[72] K. Shikano,et al. Voice conversion algorithm based on Gaussian mixture model with dynamic frequency warping of STRAIGHT spectrum , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).
[73] DeLiang Wang,et al. Gated Residual Networks with Dilated Convolutions for Supervised Speech Separation , 2018, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[74] Simon King,et al. Deep neural networks employing Multi-Task Learning and stacked bottleneck features for speech synthesis , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[75] Christoph Bregler,et al. Video Rewrite: Driving Visual Speech with Audio , 1997, SIGGRAPH.
[76] Takashi Nose,et al. A Style Control Technique for HMM-Based Expressive Speech Synthesis , 2007, IEICE Trans. Inf. Syst..
[77] Marc Schröder,et al. Expressive Speech Synthesis: Past, Present, and Possible Futures , 2009, Affective Information Processing.
[78] Alan W. Black,et al. Unit selection in a concatenative speech synthesis system using a large speech database , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.
[79] Quoc V. Le,et al. Sequence to Sequence Learning with Neural Networks , 2014, NIPS.