Unsupervised Semantic Parsing of Video Collections
暂无分享,去创建一个
Silvio Savarese | Ashutosh Saxena | Amir Roshan Zamir | Ozan Sener | S. Savarese | Ashutosh Saxena | A. Zamir | Ozan Sener
[1] Fei-Fei Li,et al. Deep visual-semantic alignments for generating image descriptions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[2] Sanja Fidler,et al. What Are You Talking About? Text-to-Image Coreference , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.
[3] Thomas Serre,et al. HMDB: A large video database for human motion recognition , 2011, 2011 International Conference on Computer Vision.
[4] Moritz Tenorth,et al. Understanding and executing instructions for everyday manipulation tasks from the World Wide Web , 2010, 2010 IEEE International Conference on Robotics and Automation.
[5] Raymond J. Mooney,et al. Improving Video Activity Recognition using Object Recognition and Text Mining , 2012, ECAI.
[6] Kevin Murphy,et al. What’s Cookin’? Interpreting Cooking Videos using Text, Speech and Vision , 2015, NAACL.
[7] Ramakant Nevatia,et al. DISCOVER: Discovering Important Segments for Classification of Video Events and Recounting , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.
[8] Yong Jae Lee,et al. Key-segments for video object segmentation , 2011, 2011 International Conference on Computer Vision.
[9] Jennifer Barry,et al. Bakebot: Baking Cookies with the PR2 , 2011 .
[10] Jeffrey Mark Siskind,et al. Grounded Language Learning from Video Described with Sentences , 2013, ACL.
[11] Jitendra Malik,et al. Recognizing action at a distance , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.
[12] Anoop Gupta,et al. Automatically extracting highlights for TV Baseball programs , 2000, ACM Multimedia.
[13] C. Schmid,et al. Category-Specific Video Summarization , 2014, ECCV.
[14] Sanja Fidler,et al. A Sentence Is Worth a Thousand Pixels , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.
[15] Kristen Grauman,et al. Story-Driven Summarization for Egocentric Video , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.
[16] Larry S. Davis,et al. Understanding videos, constructing plots learning a visually grounded storyline model from annotated videos , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.
[17] Yong Jae Lee,et al. Discovering important people and objects for egocentric video summarization , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.
[18] Ivan Laptev,et al. Efficient Feature Extraction, Encoding, and Classification for Action Recognition , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.
[19] Fei-Fei Li,et al. Deep visual-semantic alignments for generating image descriptions , 2015, CVPR.
[20] C. Lawrence Zitnick,et al. Bringing Semantics into Focus Using Visual Abstraction , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.
[21] Michael I. Jordan,et al. Joint Modeling of Multiple Related Time Series via the Beta Process , 2011, 1111.4226.
[22] Juan Carlos Niebles,et al. Modeling Temporal Structure of Decomposable Motion Segments for Activity Classification , 2010, ECCV.
[23] Jake K. Aggarwal,et al. Spatio-temporal relationship match: Video structure comparison for recognition of complex human activities , 2009, 2009 IEEE 12th International Conference on Computer Vision.
[24] Deva Ramanan,et al. Parsing Videos of Actions with Segmental Grammars , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.
[25] Ba Tu Truong,et al. Video abstraction: A systematic review and classification , 2007, TOMCCAP.
[26] Chenliang Xu,et al. A Thousand Frames in Just a Few Words: Lingual Description of Videos through Latent Topics and Sparse Object Stitching , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.
[27] Cordelia Schmid,et al. Learning realistic human actions from movies , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.
[28] Fei-Fei Li,et al. Connecting modalities: Semi-supervised segmentation and annotation of images using unaligned text corpora , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.
[29] Chih-Jen Lin,et al. Large-Scale Video Summarization Using Web-Image Priors , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.
[30] Lei Chen,et al. Learning Action Primitives for Multi-level Video Event Understanding , 2014, ECCV Workshops.
[31] Patrick Bouthemy,et al. Better Exploiting Motion for Better Action Recognition , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.
[32] Fei-Fei Li,et al. Modeling mutual context of object and human pose in human-object interaction activities , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.
[33] Quoc V. Le,et al. Grounded Compositional Semantics for Finding and Describing Images with Sentences , 2014, TACL.
[34] Claude E. Shannon,et al. The mathematical theory of communication , 1950 .
[35] Earl J. Wagner,et al. Cooking with Semantics , 2014, ACL 2014.
[36] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.
[37] Michael I. Jordan,et al. JOINT MODELING OF MULTIPLE TIME SERIES VIA THE BETA PROCESS WITH APPLICATION TO MOTION CAPTURE SEGMENTATION , 2013, 1308.4747.
[38] Lucy Vanderwende,et al. Learning the Visual Interpretation of Sentences , 2013, 2013 IEEE International Conference on Computer Vision.
[39] Cordelia Schmid,et al. The LEAR submission at Thumos 2014 , 2014 .
[40] Edwin Olson,et al. Single-Cluster Spectral Graph Partitioning for Robotics Applications , 2005, Robotics: Science and Systems.
[41] Ruslan Salakhutdinov,et al. Multimodal Neural Language Models , 2014, ICML.
[42] Cristian Sminchisescu,et al. Constrained parametric min-cuts for automatic object segmentation , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.
[43] Hema Swetha Koppula,et al. RoboBrain: Large-Scale Knowledge Engine for Robots , 2014, ArXiv.
[44] Sven J. Dickinson,et al. Video In Sentences Out , 2012, UAI.
[45] Thomas L. Griffiths,et al. Infinite latent feature models and the Indian buffet process , 2005, NIPS.
[46] Cees G. M. Snoek,et al. University of Amsterdam at THUMOS Challenge 2014 , 2014 .
[47] Vicente Ordonez,et al. Im2Text: Describing Images Using 1 Million Captioned Photographs , 2011, NIPS.
[48] B. Ripley,et al. Pattern Recognition , 1968, Nature.
[49] Dejan Pangercic,et al. Robotic roommates making pancakes , 2011, 2011 11th IEEE-RAS International Conference on Humanoid Robots.
[50] Jean Ponce,et al. Automatic annotation of human actions in video , 2009, 2009 IEEE 12th International Conference on Computer Vision.
[51] David A. Forsyth,et al. Matching Words and Pictures , 2003, J. Mach. Learn. Res..
[52] Eric P. Xing,et al. Reconstructing Storyline Graphs for Image Recommendation from Web Community Photos , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.
[53] T. Warren Liao,et al. Clustering of time series data - a survey , 2005, Pattern Recognit..
[54] Patrick Pérez,et al. Retrieving actions in movies , 2007, 2007 IEEE 11th International Conference on Computer Vision.
[55] Cyrus Rashtchian,et al. Every Picture Tells a Story: Generating Sentences from Images , 2010, ECCV.
[56] Fernando De la Torre,et al. Joint segmentation and classification of human actions in video , 2011, CVPR 2011.
[57] Pietro Perona,et al. A Factorization Approach to Grouping , 1998, ECCV.
[58] Silvio Savarese,et al. A Hierarchical Representation for Future Action Prediction , 2014, ECCV.
[59] Takeo Igarashi,et al. Generating photo manipulation tutorials by demonstration , 2009, ACM Trans. Graph..
[60] Eric P. Xing,et al. Joint Summarization of Large-Scale Collections of Web Images and Videos for Storyline Reconstruction , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.
[61] Lawrence R. Rabiner,et al. A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.
[62] Cordelia Schmid,et al. Weakly Supervised Action Labeling in Videos under Ordering Constraints , 2014, ECCV.
[63] Sang Joon Kim,et al. A Mathematical Theory of Communication , 2006 .
[64] Mubarak Shah,et al. UCF101: A Dataset of 101 Human Actions Classes From Videos in The Wild , 2012, ArXiv.