What’s Cookin’? Interpreting Cooking Videos using Text, Speech and Vision
暂无分享,去创建一个
Kevin Murphy | Jonathan Malmaud | Jonathan Huang | Andrew Rabinovich | Vivek Rathod | Nicholas Johnston | Andrew Rabinovich | K. Murphy | Jonathan Huang | V. Rathod | Nick Johnston | J. Malmaud
[1] Hank Liao,et al. Large scale deep neural network acoustic modeling with semi-supervised training data for YouTube video transcription , 2013, 2013 IEEE Workshop on Automatic Speech Recognition and Understanding.
[2] Bo Pang,et al. Spice it up? Mining Refinements to Online Instructions from User Generated Content , 2012, ACL.
[3] Jiebo Luo,et al. Unsupervised Alignment of Natural Language Instructions with Video Segments , 2014, AAAI.
[4] Estevam R. Hruschka,et al. Toward an Architecture for Never-Ending Language Learning , 2010, AAAI.
[5] Hema Swetha Koppula,et al. RoboBrain: Large-Scale Knowledge Engine for Robots , 2014, ArXiv.
[6] James Ze Wang,et al. The Story Picturing Engine---a system for automatic text illustration , 2006, TOMCCAP.
[7] Yi Li,et al. Robot Learning Manipulation Action Plans by "Watching" Unconstrained Videos from the World Wide Web , 2015, AAAI.
[8] Trevor Darrell,et al. Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.
[9] Trevor Darrell,et al. YouTube2Text: Recognizing and Describing Arbitrary Activities Using Semantic Hierarchies and Zero-Shot Recognition , 2013, 2013 IEEE International Conference on Computer Vision.
[10] Bernt Schiele,et al. A database for fine grained activity detection of cooking activities , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.
[11] Dumitru Erhan,et al. Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[12] Dan Klein,et al. Learning Accurate, Compact, and Interpretable Tree Annotation , 2006, ACL.
[13] Oren Etzioni,et al. Open Information Extraction: The Second Generation , 2011, IJCAI.
[14] Jeffrey Mark Siskind,et al. Grounded Language Learning from Video Described with Sentences , 2013, ACL.
[15] Michael S. Bernstein,et al. ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.
[16] Wei Zhang,et al. Knowledge vault: a web-scale approach to probabilistic knowledge fusion , 2014, KDD.
[17] Kate Saenko,et al. Integrating Language and Vision to Generate Natural Language Descriptions of Videos in the Wild , 2014, COLING.
[18] Bernt Schiele,et al. Script Data for Attribute-Based Recognition of Composite Activities , 2012, ECCV.
[19] William B. Dolan,et al. Collecting Highly Parallel Data for Paraphrase Evaluation , 2011, ACL.
[20] Matthieu Guillaumin,et al. Food-101 - Mining Discriminative Components with Random Forests , 2014, ECCV.
[21] Alexander G. Hauptmann,et al. Instructional Videos for Unsupervised Harvesting and Learning of Action Examples , 2014, ACM Multimedia.
[22] Jeffrey Dean,et al. Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.
[23] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.
[24] Gerhard Weikum,et al. YAGO: A Large Ontology from Wikipedia and WordNet , 2008, J. Web Semant..