Whisper to Normal Speech Conversion Using Sequence-to-Sequence Mapping Model With Auditory Attention
暂无分享,去创建一个
Wenming Zheng | Hailun Lian | Jian Zhou | Weiwei Yu | Yuting Hu | Wenming Zheng | Jian Zhou | Weiwei Yu | Hailun Lian | Yuting Hu
[1] R. Kubichek,et al. Mel-cepstral distance measure for objective speech quality assessment , 1993, Proceedings of IEEE Pacific Rim Conference on Communications Computers and Signal Processing.
[2] Tomoki Toda,et al. NAM-to-speech conversion with Gaussian mixture models , 2005, INTERSPEECH.
[3] Tomoki Toda,et al. Voice Conversion Based on Maximum-Likelihood Estimation of Spectral Parameter Trajectory , 2007, IEEE Transactions on Audio, Speech, and Language Processing.
[4] Tanja Schultz,et al. Fundamental frequency generation for whisper-to-audible speech conversion , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[5] Keiichi Tokuda,et al. Speech parameter generation algorithms for HMM-based speech synthesis , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).
[6] Dorde T. Grozdic,et al. Whispered Speech Recognition Using Deep Denoising Autoencoder and Inverse Filtering , 2017, IEEE/ACM Transactions on Audio, Speech, and Language Processing.
[7] Hideki Kawahara,et al. Restructuring speech representations using a pitch-adaptive time-frequency smoothing and an instantaneous-frequency-based F0 extraction: Possible role of a repetitive structure in sounds , 1999, Speech Commun..
[8] Zhao Heming,et al. Performance analysis of mandarin whispered speech recognition based on normal speech training model , 2016, 2016 Sixth International Conference on Information Science and Technology (ICIST).
[9] Jürgen Schmidhuber,et al. Learning to forget: continual prediction with LSTM , 1999 .
[10] Prasanta Kumar Ghosh,et al. Whispered Speech to Neutral Speech Conversion Using Bidirectional LSTMs , 2018, INTERSPEECH.
[11] Mark J. T. Smith,et al. Voice conversion based on a mixture density network , 2017, 2017 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA).
[12] Mikihiro Nakagiri,et al. Statistical Voice Conversion Techniques for Body-Conducted Unvoiced Speech Enhancement , 2012, IEEE Transactions on Audio, Speech, and Language Processing.
[13] John H. L. Hansen,et al. Speaker Identification Within Whispered Speech Audio Streams , 2011, IEEE Transactions on Audio, Speech, and Language Processing.
[14] Mo Fuyuan. A linear prediction algorithm in low bit rate speech coding improved by multi-band excitation model , 2001 .
[15] Mark A. Clements,et al. Reconstruction of speech from whispers , 2002, MAVEBA.
[16] Ian McLoughlin,et al. Whisper-to-speech conversion using restricted Boltzmann machine arrays , 2014 .
[17] Yan Song,et al. Reconstruction of continuous voiced speech from whispers , 2013, INTERSPEECH.
[18] Yoshua Bengio,et al. Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.
[19] M. Ramos. Voice Conversion with Deep Learning , 2016 .
[20] Kuldip K. Paliwal,et al. Bidirectional recurrent neural networks , 1997, IEEE Trans. Signal Process..
[21] Ian Vince McLoughlin,et al. Analysis-by-synthesis method for whisper-speech reconstruction , 2008, APCCAS 2008 - 2008 IEEE Asia Pacific Conference on Circuits and Systems.
[22] John H. L. Hansen,et al. Generative Modeling of Pseudo-Whisper for Robust Whispered Speech Recognition , 2016, IEEE/ACM Transactions on Audio, Speech, and Language Processing.
[23] Aníbal Ferreira,et al. Implantation of voicing on whispered speech using frequency-domain parametric modelling of source and filter information , 2016, 2016 International Symposium on Signal, Image, Video and Communications (ISIVC).
[24] Björn W. Schuller,et al. Exploitation of Phase-Based Features for Whispered Speech Emotion Recognition , 2016, IEEE Access.
[25] Ian McLoughlin,et al. Regeneration of Speech in Voice-Loss Patients , 2009 .
[26] Yan Song,et al. Reconstruction of Phonated Speech from Whispers Using Formant-Derived Plausible Pitch Modulation , 2015, ACM Trans. Access. Comput..
[27] Tomoki Toda,et al. Silent-speech enhancement using body-conducted vocal-tract resonance signals , 2010, Speech Commun..
[28] Jesper Jensen,et al. A short-time objective intelligibility measure for time-frequency weighted noisy speech , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.