Reconstructing Speech from Real-Time Articulatory MRI Using Neural Vocoders
暂无分享,去创建一个
[1] Pramit Saha,et al. Towards Automatic Speech Identification from Vocal Tract Shape Dynamics in Real-time MRI , 2018, INTERSPEECH.
[2] Tamás Gábor Csapó,et al. Convolutional neural network-based automatic classification of midsagittal tongue gestural targets using B-mode ultrasound images. , 2017, The Journal of the Acoustical Society of America.
[3] Pierre Roussel-Ragot,et al. An Articulatory-Based Singing Voice Synthesis Using Tongue and Lips Imaging , 2016, INTERSPEECH.
[4] Gérard Chollet,et al. Statistical Mapping Between Articulatory and Acoustic Data for an Ultrasound-Based Silent Speech Interface , 2011, INTERSPEECH.
[5] Tanja Schultz,et al. Direct conversion from facial myoelectric signals to speech using Deep Neural Networks , 2015, 2015 International Joint Conference on Neural Networks (IJCNN).
[6] J. M. Gilbert,et al. Development of a (silent) speech recognition system for patients following laryngectomy. , 2008, Medical engineering & physics.
[7] Shrikanth Narayanan,et al. Real-time magnetic resonance imaging and electromagnetic articulography database for speech production research (TC). , 2014, The Journal of the Acoustical Society of America.
[8] Phil D. Green,et al. A silent speech system based on permanent magnet articulography and direct synthesis , 2016, Comput. Speech Lang..
[9] Gábor Gosztolya,et al. DNN-Based Ultrasound-to-Speech Conversion for a Silent Speech Interface , 2017, INTERSPEECH.
[10] Petros Maragos,et al. Multi-View Audio-Articulatory Features for Phonetic Recognition on RTMRI-TIMIT Database , 2018, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[11] Tanja Schultz,et al. Domain-Adversarial Training for Session Independent EMG-based Speech Recognition , 2018, INTERSPEECH.
[12] Angel Manuel Gomez,et al. A Deep Learning Loss Function Based on the Perceptual Evaluation of the Speech Quality , 2018, IEEE Signal Processing Letters.
[13] R. Kubichek,et al. Mel-cepstral distance measure for objective speech quality assessment , 1993, Proceedings of IEEE Pacific Rim Conference on Communications Computers and Signal Processing.
[14] Tam'as G'abor Csap'o. Speaker dependent articulatory-to-acoustic mapping using real-time MRI of the vocal tract , 2020, INTERSPEECH.
[15] Jun Wang,et al. Sentence recognition from articulatory movements for silent speech interfaces , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[16] Laurent Girin,et al. Real-Time Control of an Articulatory-Based Speech Synthesizer for Brain Computer Interfaces , 2016, PLoS Comput. Biol..
[17] Bin Liu,et al. Estimate articulatory MRI series from acoustic signal using deep architecture , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[18] DeLiang Wang,et al. On Adversarial Training and Loss Functions for Speech Enhancement , 2018, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[19] J. M. Gilbert,et al. Silent speech interfaces , 2010, Speech Commun..
[20] László Tóth,et al. 3D Convolutional Neural Networks for Ultrasound-Based Silent Speech Interfaces , 2020, ICAISC.
[21] James T. Heaton,et al. Silent Speech Recognition as an Alternative Communication Device for Persons With Laryngectomy , 2017, IEEE/ACM Transactions on Audio, Speech, and Language Processing.
[22] Myungjong Kim,et al. Speaker-Independent Silent Speech Recognition From Flesh-Point Articulatory Movements Using an LSTM Neural Network , 2017, IEEE/ACM Transactions on Audio, Speech, and Language Processing.
[23] Yoshua Bengio,et al. MelGAN: Generative Adversarial Networks for Conditional Waveform Synthesis , 2019, NeurIPS.
[24] Christian Dittmar,et al. A Comparison of Recent Neural Vocoders for Speech Signal Reconstruction , 2019, 10th ISCA Workshop on Speech Synthesis (SSW 10).
[25] Tanja Schultz,et al. Estimation of fundamental frequency from surface electromyographic data: EMG-to-F0 , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[26] Tokihiko Kaburagi,et al. Articulatory-to-speech Conversion Using Bi-directional Long Short-term Memory , 2018, INTERSPEECH.
[27] James T. Heaton,et al. Towards a practical silent speech recognition system , 2014, INTERSPEECH.
[28] Jun Wang,et al. Preliminary Test of a Real-Time, Interactive Silent Speech Interface Based on Electromagnetic Articulograph , 2014, SLPAT@ACL.
[29] Jesper Jensen,et al. An Algorithm for Intelligibility Prediction of Time–Frequency Weighted Noisy Speech , 2011, IEEE Transactions on Audio, Speech, and Language Processing.
[30] Sidney Fels,et al. Ultra2Speech - A Deep Learning Framework for Formant Frequency Estimation and Tracking from Ultrasound Tongue Images , 2020, MICCAI.
[31] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.
[32] Tam'as G'abor Csap'o,et al. Ultrasound-based Articulatory-to-Acoustic Mapping with WaveGlow Speech Synthesis , 2020, INTERSPEECH.
[33] Shrikanth Narayanan,et al. Advances in vocal tract imaging and analysis , 2019, The Routledge Handbook of Phonetics.
[34] Quoc V. Le,et al. Swish: a Self-Gated Activation Function , 2017, 1710.05941.
[35] Shrikanth S. Narayanan,et al. Articulatory Synthesis Based on Real-Time Magnetic Resonance Imaging Data , 2016, INTERSPEECH.
[36] Athanasios Katsamanis,et al. Validating rt-MRI Based Articulatory Representations via Articulatory Recognition , 2011, INTERSPEECH.
[37] Thomas Hueber,et al. Feature extraction using multimodal convolutional neural networks for visual speech recognition , 2017, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[38] Ryan Prenger,et al. Waveglow: A Flow-based Generative Network for Speech Synthesis , 2018, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[39] Jesper Jensen,et al. On Loss Functions for Supervised Monaural Time-Domain Speech Enhancement , 2020, IEEE/ACM Transactions on Audio, Speech, and Language Processing.