Generating Images From Spoken Descriptions
暂无分享,去创建一个
Alan Hanjalic | Jihua Zhu | Xinsheng Wang | Odette Scharenborg | Tingting Qiao | Jihua Zhu | A. Hanjalic | O. Scharenborg | Xinsheng Wang | Tingting Qiao
[1] Wen Gao,et al. Direct Speech-to-Image Translation , 2020, IEEE Journal of Selected Topics in Signal Processing.
[2] Nenghai Yu,et al. Semantics Disentangling for Text-To-Image Generation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[3] Bolei Zhou,et al. Learning Deep Features for Scene Recognition using Places Database , 2014, NIPS.
[4] Gregory Shakhnarovich,et al. Semantic Speech Retrieval With a Visually Grounded Model of Untranscribed Speech , 2017, IEEE/ACM Transactions on Audio, Speech, and Language Processing.
[5] Ryan Prenger,et al. Waveglow: A Flow-based Generative Network for Speech Synthesis , 2018, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[6] Wojciech Zaremba,et al. Improved Techniques for Training GANs , 2016, NIPS.
[7] Lei Zhang,et al. Object-Driven Text-To-Image Synthesis via Adversarial Training , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[8] Bernt Schiele,et al. Generative Adversarial Text to Image Synthesis , 2016, ICML.
[9] Yoshua Bengio,et al. Generative Adversarial Nets , 2014, NIPS.
[10] Ping Tan,et al. DualGAN: Unsupervised Dual Learning for Image-to-Image Translation , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).
[11] Yiming Wang,et al. Semi-Orthogonal Low-Rank Matrix Factorization for Deep Neural Networks , 2018, INTERSPEECH.
[12] Jordi Torres,et al. Wav2Pix: Speech-conditioned Face Generation Using Generative Adversarial Networks , 2019, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[13] Florian Metze,et al. Speech Technology for Unwritten Languages , 2020, IEEE/ACM Transactions on Audio, Speech, and Language Processing.
[14] Florian Metze,et al. Learned in Speech Recognition: Contextual Acoustic Word Embeddings , 2019, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[15] Chen Change Loy,et al. Everybody’s Talkin’: Let Me Talk as You Want , 2020, IEEE Transactions on Information Forensics and Security.
[16] James R. Glass,et al. Towards Visually Grounded Sub-word Speech Unit Discovery , 2019, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[17] Michael S. Bernstein,et al. ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.
[18] Bernt Schiele,et al. F-VAEGAN-D2: A Feature Generating Framework for Any-Shot Learning , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[19] Sanjeev Khudanpur,et al. Librispeech: An ASR corpus based on public domain audio books , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[20] Kyogu Lee,et al. From Inference to Generation: End-to-end Fully Self-supervised Generation of Human Face from Speech , 2020, ICLR.
[21] Fei-Fei Li,et al. Deep visual-semantic alignments for generating image descriptions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[22] Tae-Hyun Oh,et al. Speech2Face: Learning the Face Behind a Voice , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[23] Bhiksha Raj,et al. Face Reconstruction from Voice using Generative Adversarial Networks , 2019, NeurIPS.
[24] Geoffrey E. Hinton,et al. Visualizing Data using t-SNE , 2008 .
[25] Dimitris N. Metaxas,et al. StackGAN: Text to Photo-Realistic Image Synthesis with Stacked Generative Adversarial Networks , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).
[26] Zhe Gan,et al. AttnGAN: Fine-Grained Text to Image Generation with Attentional Generative Adversarial Networks , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[27] Sepp Hochreiter,et al. GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium , 2017, NIPS.
[28] Pietro Perona,et al. The Caltech-UCSD Birds-200-2011 Dataset , 2011 .
[29] Shijian Lu,et al. Cascade EF-GAN: Progressive Facial Expression Editing With Local Focuses , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[30] Bernt Schiele,et al. Learning Deep Representations of Fine-Grained Visual Descriptions , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[31] Peter Young,et al. Framing Image Description as a Ranking Task: Data, Models and Evaluation Metrics , 2013, J. Artif. Intell. Res..
[32] Xiaogang Wang,et al. StackGAN++: Realistic Image Synthesis with Stacked Generative Adversarial Networks , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[33] Jing Zhang,et al. MirrorGAN: Learning Text-To-Image Generation by Redescription , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[34] Andrew Zisserman,et al. Automated Flower Classification over a Large Number of Classes , 2008, 2008 Sixth Indian Conference on Computer Vision, Graphics & Image Processing.
[35] Jihua Zhu,et al. S2IGAN: Speech-to-Image Generation via Adversarial Learning , 2020, INTERSPEECH.
[36] Jihua Zhu,et al. Domain segmentation and adjustment for generalized zero-shot learning , 2020, ArXiv.
[37] James R. Glass,et al. Deep multimodal semantic embeddings for speech and images , 2015, 2015 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU).
[38] Xin Li,et al. Semantics-Enhanced Adversarial Nets for Text-to-Image Synthesis , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).
[39] Dacheng Tao,et al. Perceptual Adversarial Networks for Image-to-Image Transformation , 2017, IEEE Transactions on Image Processing.
[40] Yoshua Bengio,et al. Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.
[41] James R. Glass,et al. Jointly Discovering Visual Objects and Spoken Words from Raw Sensory Input , 2018, ECCV.
[42] Eduardo de Campos Valadares,et al. Dancing to the music , 2000 .
[43] Navdeep Jaitly,et al. Natural TTS Synthesis by Conditioning Wavenet on MEL Spectrogram Predictions , 2017, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[44] Prateek Verma,et al. Audio-linguistic Embeddings for Spoken Sentences , 2019, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[45] Dan Su,et al. Speaker Independent and Multilingual/Mixlingual Speech-Driven Talking Head Generation Using Phonetic Posteriorgrams , 2020, ArXiv.
[46] Rui Wang,et al. Deep Audio-visual Learning: A Survey , 2020, International Journal of Automation and Computing.
[47] Mirjam Ernestus,et al. Language learning using Speech to Image retrieval , 2019, INTERSPEECH.
[48] Sergey Ioffe,et al. Rethinking the Inception Architecture for Computer Vision , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[49] Daniel Povey,et al. The Kaldi Speech Recognition Toolkit , 2011 .
[50] James R. Glass,et al. Unsupervised Learning of Spoken Language with Visual Context , 2016, NIPS.
[51] Zi Huang,et al. Leveraging the Invariant Side of Generative Zero-Shot Learning , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[52] Shiguang Shan,et al. Generative Adversarial Network with Spatial Attention for Face Attribute Editing , 2018, ECCV.
[53] Thomas Fevens,et al. Dual Adversarial Inference for Text-to-Image Synthesis , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).
[54] Gabriel Ilharco,et al. Large-Scale Representation Learning from Visually Grounded Untranscribed Speech , 2019, CoNLL.
[55] Simon Osindero,et al. Conditional Generative Adversarial Nets , 2014, ArXiv.