暂无分享,去创建一个
Chenliang Xu | Zhiheng Li | Zhiyao Duan | Ross K. Maddox | Lele Chen | Chenliang Xu | Lele Chen | R. Maddox | Zhiheng Li | Z. Duan
[1] Joon Son Chung,et al. Lip Reading in the Wild , 2016, ACCV.
[2] Derek R. Magee,et al. Virtual Immortality: Reanimating Characters from TV Shows , 2016, ECCV Workshops.
[3] Li Fei-Fei,et al. Perceptual Losses for Real-Time Style Transfer and Super-Resolution , 2016, ECCV.
[4] Roger Levy,et al. A new approach to cross-modal multimedia retrieval , 2010, ACM Multimedia.
[5] H. Hotelling. Relations Between Two Sets of Variates , 1936 .
[6] Bernt Schiele,et al. Generative Adversarial Text to Image Synthesis , 2016, ICML.
[7] Chenliang Xu,et al. Deep Cross-Modal Audio-Visual Generation , 2017, ACM Multimedia.
[8] Andrew Owens,et al. Visually Indicated Sounds , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[9] Shimon Whiteson,et al. LipNet: End-to-End Sentence-level Lipreading , 2016, 1611.01599.
[10] Yoshua Bengio,et al. Generative Adversarial Nets , 2014, NIPS.
[11] Jian Sun,et al. Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).
[12] Patrick Pérez,et al. VDub: Modifying Face Video of Actors for Plausible Visual Alignment to a Dubbed Audio Track , 2015, Comput. Graph. Forum.
[13] Pieter Abbeel,et al. InfoGAN: Interpretable Representation Learning by Information Maximizing Generative Adversarial Nets , 2016, NIPS.
[14] Simon Osindero,et al. Conditional Generative Adversarial Nets , 2014, ArXiv.
[15] Larry S. Davis,et al. Look who's talking: speaker detection using video and audio correlation , 2000, 2000 IEEE International Conference on Multimedia and Expo. ICME2000. Proceedings. Latest Advances in the Fast Changing World of Multimedia (Cat. No.00TH8532).
[16] Joon Son Chung,et al. You said that? , 2017, BMVC.
[17] Antonio Torralba,et al. Generating Videos with Scene Dynamics , 2016, NIPS.
[18] Geoffrey E. Hinton,et al. Learning representations by back-propagating errors , 1986, Nature.
[19] Jonathon Shlens,et al. Conditional Image Synthesis with Auxiliary Classifier GANs , 2016, ICML.
[20] Yejin Choi,et al. Baby talk: Understanding and generating simple image descriptions , 2011, CVPR 2011.
[21] Thomas Brox,et al. FlowNet: Learning Optical Flow with Convolutional Networks , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).
[22] Lei Xie,et al. Photo-real talking head with deep bidirectional LSTM , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[23] Eero P. Simoncelli,et al. Image quality assessment: from error visibility to structural similarity , 2004, IEEE Transactions on Image Processing.
[24] Lina J. Karam,et al. A No-Reference Image Blur Metric Based on the Cumulative Probability of Blur Detection (CPBD) , 2011, IEEE Transactions on Image Processing.
[25] Joon Son Chung,et al. Lip Reading Sentences in the Wild , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[26] Jon Barker,et al. An audio-visual corpus for speech perception and automatic speech recognition. , 2006, The Journal of the Acoustical Society of America.
[27] Ira Kemelmacher-Shlizerman,et al. Synthesizing Obama , 2017, ACM Trans. Graph..
[28] Chenliang Xu,et al. A Thousand Frames in Just a Few Words: Lingual Description of Videos through Latent Topics and Sparse Object Stitching , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.
[29] Asif A. Ghazanfar,et al. The Natural Statistics of Audiovisual Speech , 2009, PLoS Comput. Biol..
[30] Davis E. King,et al. Dlib-ml: A Machine Learning Toolkit , 2009, J. Mach. Learn. Res..
[31] Geoffrey E. Hinton,et al. Phoneme recognition using time-delay neural networks , 1989, IEEE Trans. Acoust. Speech Signal Process..