Computer Vision and Natural Language Processing
暂无分享,去创建一个
Peratham Wiriyathammabhum | Douglas Summers-Stay | Yiannis Aloimonos | Cornelia Fermüller | Y. Aloimonos | C. Fermüller | Peratham Wiriyathammabhum | Douglas Summers-Stay | D. Summers-Stay
[1] G LoweDavid,et al. Distinctive Image Features from Scale-Invariant Keypoints , 2004 .
[2] Noah A. Smith,et al. Learning Word Representations with Hierarchical Sparse Coding , 2014, ICML.
[3] Robert Pless,et al. A Survey of Manifold Learning for Images , 2009, IPSJ Trans. Comput. Vis. Appl..
[4] Trevor Darrell,et al. Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.
[5] Anima Anandkumar,et al. Tensor decompositions for learning latent variable models , 2012, J. Mach. Learn. Res..
[6] Nadav Cohen,et al. On the Expressive Power of Deep Learning: A Tensor Analysis , 2015, COLT 2016.
[7] Mark Steedman,et al. Combined Distributional and Logical Semantics , 2013, TACL.
[8] Deb Roy,et al. Grounded Situation Models for Robots: Where words and percepts meet , 2006, 2006 IEEE/RSJ International Conference on Intelligent Robots and Systems.
[9] Gemma Boleda,et al. Distributional Semantics in Technicolor , 2012, ACL.
[10] Dieter Fox,et al. Attribute based object identification , 2013, 2013 IEEE International Conference on Robotics and Automation.
[11] Martha Palmer,et al. Verbnet: a broad-coverage, comprehensive verb lexicon , 2005 .
[12] Xiaoou Tang,et al. A large-scale car dataset for fine-grained categorization and verification , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[13] Peter Stone,et al. Learning to Interpret Natural Language Commands through Human-Robot Dialog , 2015, IJCAI.
[14] Andrew Y. Ng,et al. Parsing Natural Scenes and Natural Language with Recursive Neural Networks , 2011, ICML.
[15] Yee Whye Teh,et al. Names and faces in the news , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..
[16] Thomas Hofmann,et al. Probabilistic latent semantic indexing , 1999, SIGIR '99.
[17] Yejin Choi,et al. Baby talk: Understanding and generating simple image descriptions , 2011, CVPR 2011.
[18] Jeffrey Mark Siskind,et al. Grounded Language Learning from Video Described with Sentences , 2013, ACL.
[19] Sanja Fidler,et al. Skip-Thought Vectors , 2015, NIPS.
[20] Jon Oberlander,et al. Generating Instructions in Virtual Environments (GIVE):A Challenge and an Evaluation Testbed for NLG , 2007 .
[21] Omer Levy,et al. Neural Word Embedding as Implicit Matrix Factorization , 2014, NIPS.
[22] Patrick Pantel,et al. From Frequency to Meaning: Vector Space Models of Semantics , 2010, J. Artif. Intell. Res..
[23] R. Manmatha,et al. Multiple Bernoulli relevance models for image and video annotation , 2004, CVPR 2004.
[24] Eren Erdal Aksoy,et al. Learning the semantics of object–action relations by observation , 2011, Int. J. Robotics Res..
[25] Yiannis Aloimonos,et al. Detection of Manipulation Action Consequences (MAC) , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.
[26] Kate Saenko,et al. Integrating Language and Vision to Generate Natural Language Descriptions of Videos in the Wild , 2014, COLING.
[27] Hugh F. Durrant-Whyte,et al. Simultaneous localization and mapping: part I , 2006, IEEE Robotics & Automation Magazine.
[28] Michael Beetz,et al. Improving robot manipulation through fingertip perception , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.
[29] Katrin Erk,et al. A Formal Approach to Linking Logical Form and Vector-Space Lexical Semantics , 2014 .
[30] Yoshua Bengio,et al. Deep Architectures for Baby AI , 2007 .
[31] Yiannis Aloimonos,et al. The Cognitive Dialogue: A new model for vision implementing common sense reasoning , 2015, Image Vis. Comput..
[32] Michael Isard,et al. A Multi-View Embedding Space for Modeling Internet Images, Tags, and Their Semantics , 2012, International Journal of Computer Vision.
[33] Daniel Marcu,et al. Parsing English into Abstract Meaning Representation Using Syntax-Based Machine Translation , 2015, EMNLP.
[34] Jason Weston,et al. End-To-End Memory Networks , 2015, NIPS.
[35] Jitendra Malik,et al. Shape, Illumination, and Reflectance from Shading , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[36] Yoshua Bengio,et al. Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.
[37] Marwan Mattar,et al. Labeled Faces in the Wild: A Database forStudying Face Recognition in Unconstrained Environments , 2008 .
[38] David J. Kriegman,et al. Eigenfaces vs. Fisherfaces: Recognition Using Class Specific Linear Projection , 1996, ECCV.
[39] Yoshua Bengio,et al. Exploring Strategies for Training Deep Neural Networks , 2009, J. Mach. Learn. Res..
[40] Karl Stratos,et al. Detecting Visual Text , 2012, NAACL.
[41] Raymond J. Mooney,et al. Learning to Parse Database Queries Using Inductive Logic Programming , 1996, AAAI/IAAI, Vol. 2.
[42] Richard Socher,et al. Dynamic Memory Networks for Visual and Textual Question Answering , 2016, ICML.
[43] Karl Stratos,et al. Midge: Generating Image Descriptions From Computer Vision Detections , 2012, EACL.
[44] Roy Schwartz,et al. Symmetric Pattern Based Word Embeddings for Improved Word Similarity Prediction , 2015, CoNLL.
[45] Graeme Hirst,et al. Computing Lexical Contrast , 2013, CL.
[46] David A. Forsyth,et al. Object Recognition as Machine Translation: Learning a Lexicon for a Fixed Image Vocabulary , 2002, ECCV.
[47] Michael S. Bernstein,et al. Visual7W: Grounded Question Answering in Images , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[48] Alexander C. Berg,et al. Automatic Attribute Discovery and Characterization from Noisy Web Data , 2010, ECCV.
[49] Stephen Clark,et al. Combining Symbolic and Distributional Models of Meaning , 2007, AAAI Spring Symposium: Quantum Interaction.
[50] Paul Strauss,et al. Foundations Of The Theory Of Signs , 2016 .
[51] B. Bloom. Taxonomy of educational objectives , 1956 .
[52] Mario Fritz,et al. Ask Your Neurons: A Neural-Based Approach to Answering Questions about Images , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).
[53] C. Morris. Foundations of the theory of signs , 1938 .
[54] Michael I. Jordan,et al. Modeling annotated data , 2003, SIGIR.
[55] Hanqing Lu,et al. What Visual Attributes Characterize an Object Class? , 2014, ACCV.
[56] Devi Parikh,et al. Modeling context for image understanding: When, for what, and how? , 2009 .
[57] Mark J. Huiskes,et al. The MIR flickr retrieval evaluation , 2008, MIR '08.
[58] Chitta Baral,et al. The NL2KR System , 2013, NLPAR@LPNMR.
[59] David J. Kriegman,et al. Eigenfaces vs. Fisherfaces: Recognition Using Class Specific Linear Projection , 1996, ECCV.
[60] Abhinav Gupta,et al. 3D Shape Attributes , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[61] Azriel Rosenfeld,et al. Face recognition: A literature survey , 2003, CSUR.
[62] Nitish Srivastava,et al. Multimodal learning with deep Boltzmann machines , 2012, J. Mach. Learn. Res..
[63] Jeffrey Mark Siskind,et al. Seeing What You're Told: Sentence-Guided Activity Recognition in Video , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.
[64] Peter Young,et al. From image descriptions to visual denotations: New similarity metrics for semantic inference over event descriptions , 2014, TACL.
[65] David M. Blei,et al. Probabilistic topic models , 2012, Commun. ACM.
[66] Christopher D. Manning,et al. Improved Semantic Representations From Tree-Structured Long Short-Term Memory Networks , 2015, ACL.
[67] Michael I. Jordan,et al. Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..
[68] Christopher Ré,et al. Building a Large-scale Multimodal Knowledge Base for Visual Question Answering , 2015, ArXiv.
[69] Mark S. Seidenberg,et al. Semantic feature production norms for a large set of living and nonliving things , 2005, Behavior research methods.
[70] Yoshua Bengio,et al. Show, Attend and Tell: Neural Image Caption Generation with Visual Attention , 2015, ICML.
[71] Larry S. Davis,et al. Selecting Relevant Web Trained Concepts for Automated Event Retrieval , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).
[72] Yiannis Aloimonos,et al. The minimalist grammar of action , 2012, Philosophical Transactions of the Royal Society B: Biological Sciences.
[73] Ulises Cortés,et al. Extracting Visual Patterns from Deep Learning Representations , 2015, ArXiv.
[74] Nikolaos Mavridis,et al. A review of verbal and non-verbal human-robot interactive communication , 2014, Robotics Auton. Syst..
[75] Yiannis Aloimonos,et al. Shadow free segmentation in still images using local density measure , 2014, 2014 IEEE International Conference on Computational Photography (ICCP).
[76] Antonio Torralba,et al. LabelMe: A Database and Web-Based Tool for Image Annotation , 2008, International Journal of Computer Vision.
[77] Quoc V. Le,et al. Grounded Compositional Semantics for Finding and Describing Images with Sentences , 2014, TACL.
[78] Wei Xu,et al. Are You Talking to a Machine? Dataset and Methods for Multilingual Image Question , 2015, NIPS.
[79] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.
[80] Richard Socher,et al. Ask Me Anything: Dynamic Memory Networks for Natural Language Processing , 2015, ICML.
[81] Christoph H. Lampert,et al. Learning to detect unseen object classes by between-class attribute transfer , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.
[82] Yoshua Bengio,et al. Gradient-based learning applied to document recognition , 1998, Proc. IEEE.
[83] Oren Etzioni,et al. Machine Reading , 2006, AAAI.
[84] Luke S. Zettlemoyer,et al. Learning to Map Sentences to Logical Form: Structured Classification with Probabilistic Categorial Grammars , 2005, UAI.
[85] Trevor Darrell,et al. Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.
[86] A. S. M. Ashique Mahmood,et al. Literature Survey on Topic Modeling , 2013 .
[87] Hoifung Poon,et al. Unsupervised Semantic Parsing , 2009, EMNLP.
[88] Daphne Koller,et al. Learning Spatial Context: Using Stuff to Find Things , 2008, ECCV.
[89] Margaret Mitchell,et al. VQA: Visual Question Answering , 2015, International Journal of Computer Vision.
[90] M. Turk,et al. Eigenfaces for Recognition , 1991, Journal of Cognitive Neuroscience.
[91] John B. Lowe,et al. The Berkeley FrameNet Project , 1998, ACL.
[92] C. Alberini,et al. Memory , 2006, Cellular and Molecular Life Sciences CMLS.
[93] T. Plate. A Common Framework for Distributed Representation Schemes for Compositional Structure , 1997 .
[94] Y. Mori,et al. Image-to-word transformation based on dividing and vector quantizing images with words , 1999 .
[95] Anton van den Hengel,et al. Image-Based Recommendations on Styles and Substitutes , 2015, SIGIR.
[96] Silvio Savarese,et al. Recognizing human actions by attributes , 2011, CVPR 2011.
[97] Luc Van Gool,et al. Speeded-Up Robust Features (SURF) , 2008, Comput. Vis. Image Underst..
[98] Luke Fletcher,et al. A Situationally Aware Voice‐commandable Robotic Forklift Working Alongside People in Unstructured Outdoor Environments , 2015, J. Field Robotics.
[99] Richard S. Zemel,et al. Exploring Models and Data for Image Question Answering , 2015, NIPS.
[100] Svetlana Lazebnik,et al. Superparsing , 2010, International Journal of Computer Vision.
[101] Geoffrey E. Hinton,et al. Distributed Representations , 1986, The Philosophy of Artificial Intelligence.
[102] Chet Meyers,et al. Promoting Active Learning: Strategies for the College Classroom , 1993 .
[103] Jitendra Malik,et al. Learning to detect natural image boundaries using local brightness, color, and texture cues , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[104] Xin Rong,et al. word2vec Parameter Learning Explained , 2014, ArXiv.
[105] Antonio Torralba,et al. Modeling the Shape of the Scene: A Holistic Representation of the Spatial Envelope , 2001, International Journal of Computer Vision.
[106] S T Roweis,et al. Nonlinear dimensionality reduction by locally linear embedding. , 2000, Science.
[107] Kunio Fukunaga,et al. Natural Language Description of Human Activities from Video Images Based on Concept Hierarchy of Actions , 2002, International Journal of Computer Vision.
[108] Jure Leskovec,et al. Inferring Networks of Substitutable and Complementary Products , 2015, KDD.
[109] Yiannis Aloimonos,et al. Learning the spatial semantics of manipulation actions through preposition grounding , 2015, 2015 IEEE International Conference on Robotics and Automation (ICRA).
[110] Kais Dukes,et al. SemEval-2014 Task 6: Supervised Semantic Parsing of Robotic Spatial Commands , 2014, *SEMEVAL.
[111] J. Stevens,et al. The Origin of Consciousness in the Breakdown of the Bicameral Mind by , 1978, Neurology.
[112] Zhuowen Tu,et al. Image Parsing: Unifying Segmentation, Detection, and Recognition , 2005, International Journal of Computer Vision.
[113] Jeffrey Mark Siskind,et al. A Compositional Framework for Grounding Language Inference, Generation, and Acquisition in Video , 2015, J. Artif. Intell. Res..
[114] Bingbing Ni,et al. Assistive tagging: A survey of multimedia tagging with human-computer joint exploration , 2012, CSUR.
[115] David M. Blei,et al. Supervised Topic Models , 2007, NIPS.
[116] Xiaodong Yu,et al. Active scene recognition with vision and language , 2011, 2011 International Conference on Computer Vision.
[117] Alexander Novikov,et al. Tensorizing Neural Networks , 2015, NIPS.
[118] Chris Dyer,et al. Notes on Noise Contrastive Estimation and Negative Sampling , 2014, ArXiv.
[119] Ali Farhadi,et al. Situation Recognition: Visual Semantic Role Labeling for Image Understanding , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[120] Cordelia Schmid,et al. Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).
[121] Dumitru Erhan,et al. Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[122] Cyrus Rashtchian,et al. Every Picture Tells a Story: Generating Sentences from Images , 2010, ECCV.
[123] Michael S. Bernstein,et al. Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations , 2016, International Journal of Computer Vision.
[124] Jitendra Malik,et al. The three R's of computer vision: Recognition, reconstruction and reorganization , 2016, Pattern Recognit. Lett..
[125] Rada Mihalcea,et al. Going Beyond Text: A Hybrid Image-Text Approach for Measuring Word Relatedness , 2011, IJCNLP.
[126] P. Gärdenfors. The Geometry of Meaning: Semantics Based on Conceptual Spaces , 2014 .
[127] Luke S. Zettlemoyer,et al. A Joint Model of Language and Perception for Grounded Attribute Learning , 2012, ICML.
[128] Elia Bruni,et al. Multimodal Distributional Semantics , 2014, J. Artif. Intell. Res..
[129] David A. Forsyth,et al. Animals on the Web , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).
[130] Bernt Schiele,et al. Coherent Multi-sentence Video Description with Variable Level of Detail , 2014, GCPR.
[131] Jonathan T. Barron,et al. Multiscale Combinatorial Grouping for Image Segmentation and Object Proposal Generation , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[132] Chong Wang,et al. Simultaneous image classification and annotation , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.
[133] Ali Farhadi,et al. Visalogy: Answering Visual Analogy Questions , 2015, NIPS.
[134] Ming Yang,et al. DeepFace: Closing the Gap to Human-Level Performance in Face Verification , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.
[135] Stéphane Herbin,et al. Semantic hierarchies for image annotation: A survey , 2012, Pattern Recognit..
[136] Aapo Hyvärinen,et al. Noise-Contrastive Estimation of Unnormalized Statistical Models, with Applications to Natural Image Statistics , 2012, J. Mach. Learn. Res..
[137] Yejin Choi,et al. Composing Simple Image Descriptions using Web-scale N-grams , 2011, CoNLL.
[138] Anima Anandkumar,et al. A Method of Moments for Mixture Models and Hidden Markov Models , 2012, COLT.
[139] Yiannis Aloimonos,et al. Contour Motion Estimation for Asynchronous Event-Driven Cameras , 2014, Proceedings of the IEEE.
[140] Vicente Ordonez,et al. Im2Text: Describing Images Using 1 Million Captioned Photographs , 2011, NIPS.
[141] Raymond J. Mooney,et al. Learning to Connect Language and Perception , 2008, AAAI.
[142] Shree K. Nayar,et al. Attribute and simile classifiers for face verification , 2009, 2009 IEEE 12th International Conference on Computer Vision.
[143] Dan Klein,et al. Learning to Compose Neural Networks for Question Answering , 2016, NAACL.
[144] Yee Whye Teh,et al. Names and faces in the news , 2004, CVPR 2004.
[145] Christopher Joseph Pal,et al. Describing Videos by Exploiting Temporal Structure , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).
[146] Estevam R. Hruschka,et al. Toward an Architecture for Never-Ending Language Learning , 2010, AAAI.
[147] Gökhan BakIr,et al. Predicting Structured Data , 2008 .
[148] Li Ren. A Survey on Statistical Topic Modeling , 2013 .
[149] Jeffrey Dean,et al. Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.
[150] Jennifer A. Strangfeld. Promoting Active Learning , 2013 .
[151] Mary Czerwinski,et al. Voicepedia: towards speech-based access to unstructured information , 2007, INTERSPEECH.
[152] AloimonosYiannis,et al. Computer Vision and Natural Language Processing , 2016 .
[153] 共立出版株式会社. コンピュータ・サイエンス : ACM computing surveys , 1978 .
[154] Samy Bengio,et al. Show and tell: A neural image caption generator , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[155] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.
[156] A. Bandura. Psychological Modeling; Conflicting Theories , 1971 .
[157] Li Fei-Fei,et al. DenseCap: Fully Convolutional Localization Networks for Dense Captioning , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[158] Yoshua Bengio,et al. On the Properties of Neural Machine Translation: Encoder–Decoder Approaches , 2014, SSST@EMNLP.
[159] Brian McMahan,et al. A Bayesian Model of Grounded Color Semantics , 2015, TACL.
[160] Fei-Fei Li,et al. Deep visual-semantic alignments for generating image descriptions , 2015, CVPR.
[161] J. L. Austin,et al. The foundations of arithmetic : a logico-mathematical enquiry into the concept of number , 1951 .
[162] Matthew W. Crocker,et al. Exploiting Listener Gaze to Improve Situated Communication in Dynamic Virtual Environments , 2016, Cogn. Sci..
[163] Laura A. Dabbish,et al. Labeling images with a computer game , 2004, AAAI Spring Symposium: Knowledge Collection from Volunteer Contributors.
[164] Nazli Ikizler-Cinbis,et al. Automatic Description Generation from Images: A Survey of Models, Datasets, and Evaluation Measures , 2016, J. Artif. Intell. Res..
[165] M. Carrasco. Visual attention: The past 25 years , 2011, Vision Research.
[166] Yiannis Aloimonos,et al. Cluttered scene segmentation using the symmetry constraint , 2016, 2016 IEEE International Conference on Robotics and Automation (ICRA).
[167] Leonidas J. Guibas,et al. Human action recognition by learning bases of action attributes and parts , 2011, 2011 International Conference on Computer Vision.
[168] Song-Chun Zhu,et al. Attribute And-Or Grammar for Joint Parsing of Human Attributes, Part and Pose , 2016, ArXiv.
[169] Eric O. Postma,et al. Dimensionality Reduction: A Comparative Review , 2008 .
[170] Thomas A. Schreiber,et al. The University of South Florida free association, rhyme, and word fragment norms , 2004, Behavior research methods, instruments, & computers : a journal of the Psychonomic Society, Inc.
[171] Trevor Darrell,et al. Sequence to Sequence -- Video to Text , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).
[172] A. Chemero. An Outline of a Theory of Affordances , 2003, How Shall Affordances be Refined? Four Perspectives.
[173] David A. Forsyth,et al. Learning the semantics of words and pictures , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.
[174] Md. Monirul Islam,et al. A review on automatic image annotation techniques , 2012, Pattern Recognit..
[175] Jeffrey Mark Siskind,et al. Simultaneous Object Detection, Tracking, and Event Recognition , 2012, ArXiv.
[176] Yiannis Aloimonos,et al. Fast 2D border ownership assignment , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[177] Konstantina Garoufi,et al. Planning-Based Models of Natural Language Generation , 2014, Lang. Linguistics Compass.
[178] Chenliang Xu,et al. A Thousand Frames in Just a Few Words: Lingual Description of Videos through Latent Topics and Sparse Object Stitching , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.
[179] Rama Chellappa,et al. Attributes for Improved Attributes: A Multi-Task Network for Attribute Classification , 2016, ArXiv.
[180] William I. Grosky,et al. Idea Grou p Inc . Copy right Idea Grou p Inc . Copy right Idea Grou p Inc . Copy right Idea Grou p Inc . Chapter II Bridging the Semantic Gap in Image Retrieval , 2018 .
[181] Alessandro Saffiotti,et al. Anchoring Symbols to Sensor Data: Preliminary Report , 2000, AAAI/IAAI.
[182] Song-Chun Zhu,et al. Single-View 3D Scene Parsing by Attributed Grammar , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.
[183] Salim Roukos,et al. Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.
[184] William B. Dolan,et al. Collecting Highly Parallel Data for Paraphrase Evaluation , 2011, ACL.
[185] Vittorio Ferrari,et al. Situational object boundary detection , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[186] Kate Saenko,et al. Generating Natural-Language Video Descriptions Using Text-Mined Knowledge , 2013, AAAI.
[187] H. Barlow. Vision: A computational investigation into the human representation and processing of visual information: David Marr. San Francisco: W. H. Freeman, 1982. pp. xvi + 397 , 1983 .
[188] A. Cangelosi. The grounding and sharing of symbols , 2006 .
[189] Tat-Seng Chua,et al. NUS-WIDE: a real-world web image database from National University of Singapore , 2009, CIVR '09.
[190] Bastian Leibe,et al. Visual Object Recognition , 2011, Visual Object Recognition.
[191] Pascal Vincent,et al. Representation Learning: A Review and New Perspectives , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[192] J. Piaget. Play, dreams and imitation in childhood , 1951 .
[193] Ross B. Girshick,et al. Fast R-CNN , 2015, 1504.08083.
[194] Peter Young,et al. Framing Image Description as a Ranking Task: Data, Models and Evaluation Metrics , 2013, J. Artif. Intell. Res..
[195] Carina Silberer,et al. Models of Semantic Representation with Visual Attributes , 2013, ACL.
[196] Vladimir Pavlovic,et al. A New Baseline for Image Annotation , 2008, ECCV.
[197] Ali Farhadi,et al. Attribute-centric recognition for cross-category generalization , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.
[198] Thomas Brox,et al. FlowNet: Learning Optical Flow with Convolutional Networks , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).
[199] Jiaxuan Wang,et al. HICO: A Benchmark for Recognizing Human-Object Interactions in Images , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).
[200] Xiaoping Chen,et al. Ontology Based Object Categorization for Robots , 2005, PAKM.
[201] Yiannis Aloimonos,et al. Towards a Watson that sees: Language-guided action recognition for robots , 2012, 2012 IEEE International Conference on Robotics and Automation.
[202] Ali Farhadi,et al. Describing objects by their attributes , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.
[203] Ali Farhadi,et al. Recognition using visual phrases , 2011, CVPR 2011.
[204] Wei Xu,et al. Video Paragraph Captioning Using Hierarchical Recurrent Neural Networks , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[205] John Langford,et al. Search-based structured prediction , 2009, Machine Learning.
[206] Zellig S. Harris,et al. Distributional Structure , 1954 .
[207] Massimo Poesio,et al. Of Words, Eyes and Brains: Correlating Image-Based Distributional Semantic Models with Neural Representations of Concepts , 2013, EMNLP.
[208] David Yarowsky,et al. Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing , 2013, EMNLP 2013.
[209] Marcus Rohrbach,et al. Translating Videos to Natural Language Using Deep Recurrent Neural Networks , 2014, NAACL.
[210] Georgiana Dinu,et al. Don’t count, predict! A systematic comparison of context-counting vs. context-predicting semantic vectors , 2014, ACL.
[211] Brett Browning,et al. A survey of robot learning from demonstration , 2009, Robotics Auton. Syst..
[212] G. Rizzolatti,et al. The mirror-neuron system. , 2004, Annual review of neuroscience.
[213] Changsong Liu,et al. Towards Situated Dialogue: Revisiting Referring Expression Generation , 2013, EMNLP.
[214] Tie-Yan Liu,et al. Learning to rank for information retrieval , 2009, SIGIR.
[215] Sebastian Thrun,et al. Probabilistic robotics , 2002, CACM.
[216] Fei-Fei Li,et al. What, where and who? Classifying events by scene and object recognition , 2007, 2007 IEEE 11th International Conference on Computer Vision.
[217] Alberto Del Bimbo,et al. Socializing the Semantic Gap , 2015, ACM Comput. Surv..
[218] Gabriella Vigliocco,et al. Integrating experiential and distributional data to learn semantic representations. , 2009, Psychological review.
[219] Mark Steedman,et al. Surface structure and interpretation , 1996, Linguistic inquiry.
[220] Larry S. Davis,et al. Fast Automatic Video Retrieval using Web Images , 2015, ArXiv.
[221] Yejin Choi,et al. Collective Generation of Natural Image Descriptions , 2012, ACL.
[222] Yoshua Bengio,et al. Word Representations: A Simple and General Method for Semi-Supervised Learning , 2010, ACL.
[223] Alexander C. Berg,et al. Finding iconic images , 2009, 2009 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.
[224] Jitendra Malik,et al. Visual Semantic Role Labeling , 2015, ArXiv.
[225] Rainer Stiefelhagen,et al. Book2Movie: Aligning video scenes with book chapters , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[226] R. Manmatha,et al. A Model for Learning the Semantics of Pictures , 2003, NIPS.
[227] Jeffrey Mark Siskind,et al. Grounding the Lexical Semantics of Verbs in Visual Perception using Force Dynamics and Event Logic , 1999, J. Artif. Intell. Res..
[228] Kevin Murphy,et al. What’s Cookin’? Interpreting Cooking Videos using Text, Speech and Vision , 2015, NAACL.
[229] Raffaella Bernardi,et al. TUHOI: Trento Universal Human Object Interaction Dataset , 2014, VL@COLING.
[230] Omer Levy,et al. word2vec Explained: deriving Mikolov et al.'s negative-sampling word-embedding method , 2014, ArXiv.
[231] C. Lawrence Zitnick,et al. CIDEr: Consensus-based image description evaluation , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[232] Luke S. Zettlemoyer,et al. Learning to Parse Natural Language Commands to a Robot Control System , 2012, ISER.
[233] Douglas Summers-Stay,et al. Using a minimal action grammar for activity understanding in the real world , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.
[234] Dieter Fox,et al. A large-scale hierarchical multi-view RGB-D object dataset , 2011, 2011 IEEE International Conference on Robotics and Automation.
[235] R. Fisher. THE USE OF MULTIPLE MEASUREMENTS IN TAXONOMIC PROBLEMS , 1936 .
[236] Licheng Yu,et al. Visual Madlibs: Fill in the Blank Description Generation and Question Answering , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).
[237] Song-Chun Zhu,et al. A Unified Framework for Human-Robot Knowledge Transfer , 2015, AAAI Fall Symposia.
[238] Subhransu Maji,et al. Automatic Image Annotation using Deep Learning Representations , 2015, ICMR.
[239] D. Roy. Grounding words in perception and action: computational insights , 2005, Trends in Cognitive Sciences.
[240] Francis Ferraro,et al. On Available Corpora for Empirical Methods in Vision & Language , 2015, ArXiv.
[241] Marco Baroni,et al. Grounding Distributional Semantics in the Visual World , 2016, Lang. Linguistics Compass.
[242] Douglas Greenlee,et al. Semiotic and Significs: The Correspondence between Charles S. Peirce and Victoria Lady Welby , 1978 .
[243] Peng Wang,et al. Ask Me Anything: Free-Form Visual Question Answering Based on Knowledge from External Sources , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[244] A. W. Evans,et al. Applying the Wizard-of-Oz Technique to Multimodal Human-Robot Dialogue , 2017, ArXiv.
[245] George A. Miller,et al. WordNet: A Lexical Database for English , 1995, HLT.
[246] Christopher Joseph Pal,et al. Using Descriptive Video Services to Create a Large Data Source for Video Annotation Research , 2015, ArXiv.
[247] Andrew Chou,et al. Semantic Parsing on Freebase from Question-Answer Pairs , 2013, EMNLP.
[248] Ali Farhadi,et al. Designing representational architectures in recognition , 2011 .
[249] Dan Klein,et al. Neural Module Networks , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[250] Antonio Torralba,et al. Context models and out-of-context objects , 2012, Pattern Recognit. Lett..
[251] Percy Liang,et al. Compositional Semantic Parsing on Semi-Structured Tables , 2015, ACL.
[252] Ben Taskar,et al. Learning structured prediction models: a large margin approach , 2005, ICML.
[253] Raymond J. Mooney,et al. Learning to sportscast: a test of grounded language acquisition , 2008, ICML '08.
[254] Alexander M. Bronstein,et al. Three-Dimensional Face Recognition , 2005, International Journal of Computer Vision.
[255] Allan Jabri,et al. Learning Visual Features from Large Weakly Supervised Data , 2015, ECCV.
[256] Cristian Sminchisescu,et al. Constrained parametric min-cuts for automatic object segmentation , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.
[257] Changsong Liu,et al. Learning to Mediate Perceptual Differences in Situated Human-Robot Dialogue , 2015, AAAI.
[258] Yoshua Bengio,et al. Hierarchical Probabilistic Neural Network Language Model , 2005, AISTATS.
[259] Zellig S. Harris,et al. Distributional Structure , 1954 .
[260] 木村 和夫. Pragmatics , 1997, Language Teaching.
[261] Ross A. Knepper,et al. Asking for Help Using Inverse Semantics , 2014, Robotics: Science and Systems.
[262] Vibhav Vineet,et al. Conditional Random Fields as Recurrent Neural Networks , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).
[263] Francis Ferraro,et al. A Survey of Current Datasets for Vision and Language Research , 2015, EMNLP.
[264] David A. Forsyth,et al. Matching Words and Pictures , 2003, J. Mach. Learn. Res..
[265] Matthew Stone,et al. Sentence generation as a planning problem , 2007, ACL.
[266] Abhinav Gupta,et al. Beyond Nouns and Verbs , 2009 .
[267] G. Adam. The relationship between attention and working memory , 2011 .
[268] Gordon Cheng,et al. New materials and advances in making electronic skin for interactive robots , 2015, Adv. Robotics.
[269] Michael Beetz,et al. Visually Tracking Football Games Based on TV Broadcasts , 2007, IJCAI.
[270] Jeffrey Mark Siskind,et al. Saying What You're Looking For: Linguistics Meets Video Search , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[271] Xinlei Chen,et al. Microsoft COCO Captions: Data Collection and Evaluation Server , 2015, ArXiv.
[272] Chitta Baral,et al. From Images to Sentences through Scene Description Graphs using Commonsense Reasoning and Knowledge , 2015, ArXiv.
[273] R. Manmatha,et al. Automatic image annotation and retrieval using cross-media relevance models , 2003, SIGIR.
[274] Christopher Hunt,et al. Notes on the OpenSURF Library , 2009 .
[275] Alap Karapurkar. Modeling Human Activities , 2005 .
[276] Changsong Liu,et al. Probabilistic Labeling for Efficient Referential Grounding based on Collaborative Discourse , 2014, ACL.
[277] R. J. Williams,et al. On the use of backpropagation in associative reinforcement learning , 1988, IEEE 1988 International Conference on Neural Networks.
[278] Chin-Yew Lin,et al. ROUGE: A Package for Automatic Evaluation of Summaries , 2004, ACL 2004.
[279] Katrin Erk,et al. Representing Meaning with a Combination of Logical Form and Vectors , 2015, ArXiv.
[280] Yiannis Aloimonos,et al. Robots with language: Multi-label visual recognition using NLP , 2013, 2013 IEEE International Conference on Robotics and Automation.
[281] B. Scassellati,et al. Who is IT? Inferring role and intent from agent motion , 2007, 2007 IEEE 6th International Conference on Development and Learning.
[282] Angel X. Chang,et al. Semantic Parsing for Text to 3D Scene Generation , 2014, ACL 2014.
[283] Yulia Tsvetkov,et al. Sparse Overcomplete Word Vector Representations , 2015, ACL.
[284] Yi Li,et al. Robot Learning Manipulation Action Plans by "Watching" Unconstrained Videos from the World Wide Web , 2015, AAAI.
[285] Jonathan H. Connell,et al. A Statistical Approach for Real-time Robust Background Subtrac tion and Shadow Detection , 2014 .
[286] Mikhail Belkin,et al. Laplacian Eigenmaps for Dimensionality Reduction and Data Representation , 2003, Neural Computation.
[287] Marcus Rohrbach,et al. A Multi-scale Multiple Instance Video Description Network , 2015, ArXiv.
[288] N. Cowan. What are the differences between long-term, short-term, and working memory? , 2008, Progress in brain research.
[289] Yiannis Aloimonos,et al. Affordance detection of tool parts from geometric features , 2015, 2015 IEEE International Conference on Robotics and Automation (ICRA).
[290] Yi Li,et al. Neural Self Talk: Image Understanding via Continuous Questioning and Answering , 2015, ArXiv.
[291] Hinrich Schütze,et al. AutoExtend: Extending Word Embeddings to Embeddings for Synsets and Lexemes , 2015, ACL.
[292] Bernt Schiele,et al. A dataset for Movie Description , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[293] Kristen Grauman,et al. Relative attributes , 2011, 2011 International Conference on Computer Vision.
[294] Gerard Salton,et al. A vector space model for automatic indexing , 1975, CACM.
[295] Dieter Fox,et al. Object Recognition in 3D Point Clouds Using Web Data and Domain Adaptation , 2010, Int. J. Robotics Res..
[296] Thorsten Brants,et al. One billion word benchmark for measuring progress in statistical language modeling , 2013, INTERSPEECH.
[297] Jianxiong Xiao,et al. 3D ShapeNets: A deep representation for volumetric shapes , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[298] Xinlei Chen,et al. NEIL: Extracting Visual Knowledge from Web Data , 2013, 2013 IEEE International Conference on Computer Vision.
[299] G. Aschersleben,et al. The Theory of Event Coding (TEC): a framework for perception and action planning. , 2001, The Behavioral and brain sciences.
[300] Yejin Choi,et al. From Large Scale Image Categorization to Entry-Level Categories , 2013, 2013 IEEE International Conference on Computer Vision.
[301] J. Tenenbaum,et al. A global geometric framework for nonlinear dimensionality reduction. , 2000, Science.
[302] Michael X Cohen,et al. Organizational Routines Are Stored as Procedural Memory: Evidence from a Laboratory Study , 1994 .
[303] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.
[304] Jeffrey Pennington,et al. GloVe: Global Vectors for Word Representation , 2014, EMNLP.
[305] Yoshua Bengio,et al. A Neural Probabilistic Language Model , 2003, J. Mach. Learn. Res..
[306] Joan Bruna,et al. Exploiting Linear Structure Within Convolutional Networks for Efficient Evaluation , 2014, NIPS.
[307] Hal Daumé,et al. Frustratingly Easy Domain Adaptation , 2007, ACL.
[308] Sanja Fidler,et al. Aligning Books and Movies: Towards Story-Like Visual Explanations by Watching Movies and Reading Books , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).
[309] Yiannis Aloimonos,et al. Corpus-Guided Sentence Generation of Natural Images , 2011, EMNLP.
[310] Matthew R. Walter,et al. Understanding Natural Language Commands for Robotic Navigation and Mobile Manipulation , 2011, AAAI.
[311] Alexei A. Efros,et al. Unbiased look at dataset bias , 2011, CVPR 2011.
[312] Deriving Boolean structures from distributional vectors , 2015, Transactions of the Association for Computational Linguistics.
[313] Alexander Koller,et al. Automated Planning for Situated Natural Language Generation , 2010, ACL.
[314] Thomas Hofmann,et al. Predicting Structured Data (Neural Information Processing) , 2007 .
[315] Frank Keller,et al. Comparing Automatic Evaluation Measures for Image Description , 2014, ACL.
[316] J. MacQueen. Some methods for classification and analysis of multivariate observations , 1967 .
[317] Sabine Schulte im Walde,et al. A Multimodal LDA Model integrating Textual, Cognitive and Visual Modalities , 2013, EMNLP.
[318] Sanja Fidler,et al. MovieQA: Understanding Stories in Movies through Question-Answering , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[319] David G. Lowe,et al. Distinctive Image Features from Scale-Invariant Keypoints , 2004, International Journal of Computer Vision.
[320] Silvia Coradeschi,et al. A Short Review of Symbol Grounding in Robotic and Intelligent Systems , 2013, KI - Künstliche Intelligenz.
[321] Eren Erdal Aksoy,et al. Learning the Semantics of Manipulation Action , 2015, ACL.
[322] Li Fei-Fei,et al. End-to-End Learning of Action Detection from Frame Glimpses in Videos , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[323] Omer Levy,et al. Improving Distributional Similarity with Lessons Learned from Word Embeddings , 2015, TACL.
[324] Douglas Summers-Stay,et al. Productive Vision: Methods for Automatic Image Comprehension , 2013 .
[325] Anima Anandkumar,et al. A Spectral Algorithm for Latent Dirichlet Allocation , 2012, Algorithmica.
[326] Trevor Darrell,et al. YouTube2Text: Recognizing and Describing Arbitrary Activities Using Semantic Hierarchies and Zero-Shot Recognition , 2013, 2013 IEEE International Conference on Computer Vision.
[327] Wei Lin,et al. Revisiting Word Embedding for Contrasting Meaning , 2015, ACL.
[328] Frank Keller,et al. Image Description using Visual Dependency Representations , 2013, EMNLP.
[329] Jitendra Malik,et al. Indoor Scene Understanding with RGB-D Images: Bottom-up Segmentation, Object Detection and Semantic Segmentation , 2015, International Journal of Computer Vision.
[330] Luc Van Gool,et al. The Pascal Visual Object Classes (VOC) Challenge , 2010, International Journal of Computer Vision.
[331] Yiannis Aloimonos,et al. A Language for Human Action , 2007, Computer.
[332] Jitendra Malik,et al. Simultaneous Detection and Segmentation , 2014, ECCV.
[333] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[334] Pietro Perona,et al. Learning Generative Visual Models from Few Training Examples: An Incremental Bayesian Approach Tested on 101 Object Categories , 2004, 2004 Conference on Computer Vision and Pattern Recognition Workshop.