暂无分享,去创建一个
[1] I. Pavlov,et al. The Work of the Digestive Glands , 1903, Bristol Medico-Chirurgical Journal (1883).
[2] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.
[3] Yoshua Bengio,et al. Gradient-based learning applied to document recognition , 1998, Proc. IEEE.
[4] B. Holden. Listen and learn , 2002 .
[5] Eero P. Simoncelli,et al. Image quality assessment: from error visibility to structural similarity , 2004, IEEE Transactions on Image Processing.
[6] Michael Elad,et al. Pixels that sound , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).
[7] Marc'Aurelio Ranzato,et al. Video (language) modeling: a baseline for generative models of natural videos , 2014, ArXiv.
[8] Yoshua Bengio,et al. Generative Adversarial Nets , 2014, NIPS.
[9] Max Welling,et al. Auto-Encoding Variational Bayes , 2013, ICLR.
[10] Quoc V. Le,et al. Sequence to Sequence Learning with Neural Networks , 2014, NIPS.
[11] Sergey Ioffe,et al. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.
[12] Thomas Brox,et al. U-Net: Convolutional Networks for Biomedical Image Segmentation , 2015, MICCAI.
[13] Nitish Srivastava,et al. Unsupervised Learning of Video Representations using LSTMs , 2015, ICML.
[14] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[15] Jiajun Wu,et al. Visual Dynamics: Probabilistic Future Frame Synthesis via Cross Convolutional Networks , 2016, NIPS.
[16] Jitendra Malik,et al. Learning Visual Predictive Models of Physics for Playing Billiards , 2015, ICLR.
[17] Antonio Torralba,et al. SoundNet: Learning Sound Representations from Unlabeled Video , 2016, NIPS.
[18] James R. Glass,et al. Unsupervised Learning of Spoken Language with Visual Context , 2016, NIPS.
[19] Antonio Torralba,et al. Anticipating Visual Representations from Unlabeled Video , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[20] Andrew Owens,et al. Ambient Sound Provides Supervision for Visual Learning , 2016, ECCV.
[21] Luc Van Gool,et al. Dynamic Filter Networks , 2016, NIPS.
[22] Andrew Owens,et al. Visually Indicated Sounds , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[23] Aaron C. Courville,et al. Discriminative Regularization for Generative Models , 2016, ArXiv.
[24] Antonio Torralba,et al. Generating Videos with Scene Dynamics , 2016, NIPS.
[25] Sergey Levine,et al. Unsupervised Learning for Physical Interaction through Video Prediction , 2016, NIPS.
[26] Shunta Saito,et al. Temporal Generative Adversarial Nets with Singular Value Clipping , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).
[27] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[28] Chenliang Xu,et al. Deep Cross-Modal Audio-Visual Generation , 2017, ACM Multimedia.
[29] Martial Hebert,et al. The Pose Knows: Video Forecasting by Generating Pose Futures , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).
[30] Aren Jansen,et al. Audio Set: An ontology and human-labeled dataset for audio events , 2017, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[31] Ira Kemelmacher-Shlizerman,et al. Synthesizing Obama , 2017, ACM Trans. Graph..
[32] Jaakko Lehtinen,et al. Audio-driven facial animation by joint end-to-end learning of pose and emotion , 2017, ACM Trans. Graph..
[33] Andrew Zisserman,et al. Look, Listen and Learn , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).
[34] Jan Kautz,et al. Unsupervised Image-to-Image Translation Networks , 2017, NIPS.
[35] Seunghoon Hong,et al. Decomposing Motion and Content for Natural Video Sequence Prediction , 2017, ICLR.
[36] Philip Bachman,et al. Machine Comprehension by Text-to-Text Neural Question Generation , 2017, Rep4NLP@ACL.
[37] Yisong Yue,et al. A deep learning approach for generalized speech animation , 2017, ACM Trans. Graph..
[38] Li Fei-Fei,et al. Unsupervised Learning of Long-Term Motion Dynamics for Videos , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[39] Aaron C. Courville,et al. Improved Training of Wasserstein GANs , 2017, NIPS.
[40] Andrew Owens,et al. Audio-Visual Scene Analysis with Self-Supervised Multisensory Features , 2018, ECCV.
[41] Juan Carlos Niebles,et al. Learning to Decompose and Disentangle Representations for Video Prediction , 2018, NeurIPS.
[42] Yitong Li,et al. Video Generation From Text , 2017, AAAI.
[43] P. Corlett,et al. Conditioned hallucinations: historic insights and future directions , 2018, World psychiatry : official journal of the World Psychiatric Association.
[44] Sergey Levine,et al. Stochastic Variational Video Prediction , 2017, ICLR.
[45] Ali Farhadi,et al. Imagine This! Scripts to Compositions to Videos , 2018, ECCV.
[46] Gustavo K. Rohde,et al. Sliced-Wasserstein Autoencoder: An Embarrassingly Simple Generative Model , 2018, ArXiv.
[47] Serge J. Belongie,et al. Controllable Video Generation with Sparse Trajectories , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[48] Jan Kautz,et al. Video-to-Video Synthesis , 2018, NeurIPS.
[49] Ira Kemelmacher-Shlizerman,et al. Audio to Body Dynamics , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[50] Rob Fergus,et al. Stochastic Video Generation with a Learned Prior , 2018, ICML.
[51] Antonio Torralba,et al. Through-Wall Human Pose Estimation Using Radio Signals , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[52] Jan Kautz,et al. MoCoGAN: Decomposing Motion and Content for Video Generation , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[53] Zhaoxiang Zhang,et al. CMCGAN: A Uniform Framework for Cross-Modal Visual-Audio Mutual Generation , 2017, AAAI.
[54] Alexander G. Schwing,et al. Generative Modeling Using the Sliced Wasserstein Distance , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[55] Maja Pantic,et al. End-to-End Speech-Driven Facial Animation with Temporal GANs , 2018, BMVC.
[56] Chuang Gan,et al. The Sound of Pixels , 2018, ECCV.
[57] Joon Son Chung,et al. You Said That?: Synthesising Talking Faces from Audio , 2019, International Journal of Computer Vision.
[58] Jeff Donahue,et al. Large Scale GAN Training for High Fidelity Natural Image Synthesis , 2018, ICLR.
[59] Luc Van Gool,et al. Sliced Wasserstein Generative Models , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[60] Xiaogang Wang,et al. Video Generation From Single Semantic Label Map , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[61] In So Kweon,et al. Deep Video Inpainting , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[62] Gordon Wetzstein,et al. Acoustic Non-Line-Of-Sight Imaging , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[63] Xiaogang Wang,et al. Vision-Infused Deep Audio Inpainting , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).
[64] Shun-Po Chuang,et al. Towards Audio to Scene Image Synthesis Using Generative Adversarial Network , 2018, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[65] Chenliang Xu,et al. Hierarchical Cross-Modal Talking Face Generation With Dynamic Pixel-Wise Loss , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[66] Chuang Gan,et al. The Sound of Motions , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).
[67] Jordi Torres,et al. Wav2Pix: Speech-conditioned Face Generation Using Generative Adversarial Networks , 2019, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[68] Tae-Hyun Oh,et al. Speech2Face: Learning the Face Behind a Voice , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).