Listening While Speaking and Visualizing: Improving ASR Through Multimodal Chain
暂无分享,去创建一个
Satoshi Nakamura | Sakriani Sakti | Johanes Effendi | Andros Tjandra | S. Sakti | Satoshi Nakamura | Johanes Effendi | Andros Tjandra
[1] Hyunsoo Kim,et al. Learning to Discover Cross-Domain Relations with Generative Adversarial Networks , 2017, ICML.
[2] Lei Zhang,et al. Turbo Learning for Captionbot and Drawingbot , 2018, NeurIPS.
[3] Samy Bengio,et al. Tacotron: Towards End-to-End Speech Synthesis , 2017, INTERSPEECH.
[4] Zhe Gan,et al. AttnGAN: Fine-Grained Text to Image Generation with Attentional Generative Adversarial Networks , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[5] Daniel Povey,et al. The Kaldi Speech Recognition Toolkit , 2011 .
[6] Satoshi Nakamura,et al. End-to-end Feedback Loss in Speech Chain Framework via Straight-through Estimator , 2019, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[7] Xirong Li,et al. Word2VisualVec: Cross-Media Retrieval by Visual Feature Prediction , 2016, ArXiv.
[8] Yoshua Bengio,et al. Show, Attend and Tell: Neural Image Caption Generation with Visual Attention , 2015, ICML.
[9] Maja Pantic,et al. End-to-End Audiovisual Fusion with LSTMs , 2017, AVSP.
[10] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.
[11] Jae S. Lim,et al. Signal estimation from modified short-time Fourier transform , 1983, ICASSP.
[12] Joon Son Chung,et al. Lip Reading Sentences in the Wild , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[13] Jason Weston,et al. WSABIE: Scaling Up to Large Vocabulary Image Annotation , 2011, IJCAI.
[14] Ulises Cortés,et al. Studying the impact of the Full-Network embedding on multimodal pipelines , 2019, Semantic Web.
[15] Satoshi Nakamura,et al. Machine Speech Chain with One-shot Speaker Adaptation , 2018, INTERSPEECH.
[16] 拓海 杉山,et al. “Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks”の学習報告 , 2017 .
[17] Lin Ma,et al. Multimodal Convolutional Neural Networks for Matching Image and Sentence , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).
[18] Satoshi Nakamura,et al. Listening while speaking: Speech chain by deep learning , 2017, 2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU).
[19] Ping Tan,et al. DualGAN: Unsupervised Dual Learning for Image-to-Image Translation , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).
[20] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[21] Andreas Stolcke,et al. The Microsoft 2017 Conversational Speech Recognition System , 2017, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[22] Quoc V. Le,et al. Listen, attend and spell: A neural network for large vocabulary conversational speech recognition , 2015, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[23] Xiaodong Cui,et al. English Conversational Telephone Speech Recognition by Humans and Machines , 2017, INTERSPEECH.
[24] Tie-Yan Liu,et al. Dual Learning for Machine Translation , 2016, NIPS.
[25] Svetlana Lazebnik,et al. Flickr30k Entities: Collecting Region-to-Phrase Correspondences for Richer Image-to-Sentence Models , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).
[26] Shinji Watanabe,et al. Joint CTC-attention based end-to-end speech recognition using multi-task learning , 2016, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[27] Satoshi Nakamura,et al. Multi-Scale Alignment and Contextual History for Attention Mechanism in Sequence-to-Sequence Model , 2018, 2018 IEEE Spoken Language Technology Workshop (SLT).
[28] Yang Liu,et al. Towards Robust Neural Machine Translation , 2018, ACL.
[29] Qun Liu,et al. Sentence-Level Multilingual Multi-modal Embedding for Natural Language Processing , 2017, RANLP.
[30] Tomi Kinnunen,et al. INTERSPEECH 2013 14thAnnual Conference of the International Speech Communication Association , 2013, Interspeech 2015.
[31] Kou Tanaka,et al. Synthetic-to-Natural Speech Waveform Conversion Using Cycle-Consistent Adversarial Networks , 2018, 2018 IEEE Spoken Language Technology Workshop (SLT).
[32] Janet M. Baker,et al. The Design for the Wall Street Journal-based CSR Corpus , 1992, HLT.
[33] Joon Son Chung,et al. Deep Audio-Visual Speech Recognition , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.