Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations
暂无分享,去创建一个
Michael S. Bernstein | Li Fei-Fei | David A. Shamma | Yannis Kalantidis | Li-Jia Li | Justin Johnson | Oliver Groth | Ranjay Krishna | Kenji Hata | Michael S. Bernstein | Stephanie Chen | Joshua Kravitz | Yuke Zhu | David A. Shamma | Li-Jia Li | Li Fei-Fei | D. Shamma | Justin Johnson | Yuke Zhu | Ranjay Krishna | O. Groth | K. Hata | J. Kravitz | Stephanie Chen | Yannis Kalantidis | Joshua Kravitz
[1] Roger C. Schank,et al. Scripts, plans, goals and understanding: an inquiry into human knowledge structures , 1978 .
[2] C. Welin. Scripts, plans, goals and understanding, an inquiry into human knowledge structures: Roger C. Schank and Robert P. Abelson Hillsdale: Lawrence Erlbaum Associates, 1977. 248 pp. £ 10.60 hardcover , 1979 .
[3] Kenneth D. Forbus. Qualitative Process Theory , 1984, Artif. Intell..
[4] J. Bruner. Culture and Human Development: A New Look , 1990 .
[5] Patrick J. Hayes,et al. The Naive Physics Manifesto , 1990, The Philosophy of Artificial Intelligence.
[6] Michael I. Jordan,et al. Advances in Neural Information Processing Systems 30 , 1995 .
[7] George A. Miller,et al. WordNet: A Lexical Database for English , 1995, HLT.
[8] Patrick J. Hayes,et al. The second naive physics manifesto , 1995 .
[9] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.
[10] George A. Miller,et al. Using Corpus Statistics and WordNet Relations for Sense Identification , 1998, CL.
[11] John B. Lowe,et al. The Berkeley FrameNet Project , 1998, ACL.
[12] George Karypis,et al. A Comparison of Document Clustering Techniques , 2000 .
[13] Danqi Chen,et al. of the Association for Computational Linguistics: , 2001 .
[14] Mark A. Musen,et al. A Template-Based Approach Toward Acquisition of Logical Sentences , 2002, Intelligent Information Processing.
[15] Pierre Isabelle,et al. Proceedings of the 40th Annual Meeting on Association for Computational Linguistics , 2002, ACL 2002.
[16] Salim Roukos,et al. Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.
[17] Steven Bird,et al. NLTK: The Natural Language Toolkit , 2002, ACL.
[18] Aron Culotta,et al. Dependency Tree Kernels for Relation Extraction , 2004, ACL.
[19] Andrew Zisserman,et al. A Statistical Approach to Texture Classification from Single Images , 2004, International Journal of Computer Vision.
[20] Adam Kilgarriff,et al. The Senseval-3 English lexical sample task , 2004, SENSEVAL@ACL.
[21] Pietro Perona,et al. Learning Generative Visual Models from Few Training Examples: An Incremental Bayesian Approach Tested on 101 Object Categories , 2004, 2004 Conference on Computer Vision and Pattern Recognition Workshop.
[22] Jian Su,et al. Exploring Various Knowledge in Relation Extraction , 2005, ACL.
[23] Razvan C. Bunescu,et al. A Shortest Path Dependency Kernel for Relation Extraction , 2005, HLT.
[24] Martha Palmer,et al. Verbnet: a broad-coverage, comprehensive verb lexicon , 2005 .
[25] Antonio Torralba,et al. LabelMe: A Database and Web-Based Tool for Image Annotation , 2008, International Journal of Computer Vision.
[26] Andrew Zisserman,et al. Learning Visual Attributes , 2007, NIPS.
[27] G. Griffin,et al. Caltech-256 Object Category Dataset , 2007 .
[28] Benjamin Z. Yao,et al. Introduction to a Large-Scale General Purpose Ground Truth Database: Methodology, Annotation Tool and Benchmarks , 2007, EMMCVPR.
[29] Guodong Zhou,et al. Tree Kernel-Based Relation Extraction with Context-Sensitive Structured Parse Tree Information , 2007, EMNLP.
[30] Brendan T. O'Connor,et al. Cheap and Fast – But is it Good? Evaluating Non-Expert Annotations for Natural Language Tasks , 2008, EMNLP.
[31] Andreas Christmann,et al. Support vector machines , 2008, Data Mining and Knowledge Discovery Handbook.
[32] Antonio Torralba,et al. Ieee Transactions on Pattern Analysis and Machine Intelligence 1 80 Million Tiny Images: a Large Dataset for Non-parametric Object and Scene Recognition , 2022 .
[33] Alexei A. Efros,et al. Recognition by association via learning per-exemplar distances , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.
[34] Larry S. Davis,et al. Beyond Nouns: Exploiting Prepositions and Comparative Adjectives for Learning Visual Classifiers , 2008, ECCV.
[35] Marwan Mattar,et al. Labeled Faces in the Wild: A Database forStudying Face Recognition in Unconstrained Environments , 2008 .
[36] Christoph H. Lampert,et al. Learning to detect unseen object classes by between-class attribute transfer , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.
[37] Li Fei-Fei,et al. ImageNet: A large-scale hierarchical image database , 2009, CVPR.
[38] Estevam R. Hruschka,et al. Toward Never Ending Language Learning , 2009, AAAI Spring Symposium: Learning by Reading and Learning to Read.
[39] Larry S. Davis,et al. Observing Human-Object Interactions: Using Spatial and Functional Compatibility for Recognition , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[40] Bo Zhang,et al. StatSnowball: a statistical approach to extracting entity relationships , 2009, WWW '09.
[41] Luc Van Gool,et al. The Pascal Visual Object Classes (VOC) Challenge , 2010, International Journal of Computer Vision.
[42] Roberto Navigli,et al. Word sense disambiguation: A survey , 2009, CSUR.
[43] Ali Farhadi,et al. Describing objects by their attributes , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.
[44] Dima Damen,et al. Recognizing linked events: Searching the space of feasible explanations , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.
[45] Thomas Mensink,et al. Improving the Fisher Kernel for Large-Scale Image Classification , 2010, ECCV.
[46] Jennifer Chu-Carroll,et al. Building Watson: An Overview of the DeepQA Project , 2010, AI Mag..
[47] Krista A. Ehinger,et al. SUN database: Large-scale scene recognition from abbey to zoo , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.
[48] Fei-Fei Li,et al. Modeling mutual context of object and human pose in human-object interaction activities , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.
[49] Cyrus Rashtchian,et al. Every Picture Tells a Story: Generating Sentences from Images , 2010, ECCV.
[50] Vicente Ordonez,et al. Im2Text: Describing Images Using 1 Million Captioned Photographs , 2011, NIPS.
[51] Pietro Perona,et al. The Caltech-UCSD Birds-200-2011 Dataset , 2011 .
[52] Ali Farhadi,et al. Recognition using visual phrases , 2011, CVPR 2011.
[53] Andrew Y. Ng,et al. Semantic Compositionality through Recursive Matrix-Vector Spaces , 2012, EMNLP.
[54] Pietro Perona,et al. Pedestrian Detection: An Evaluation of the State of the Art , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[55] Cordelia Schmid,et al. Weakly Supervised Learning of Interactions between Humans and Objects , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[56] Yi Yang,et al. Recognizing proxemics in personal photos , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.
[57] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.
[58] Derek Hoiem,et al. Indoor Segmentation and Support Inference from RGBD Images , 2012, ECCV.
[59] Christopher Ré,et al. Elementary: Large-Scale Knowledge-Base Construction via Machine Learning and Statistical Inference , 2012, Int. J. Semantic Web Inf. Syst..
[60] C. Lawrence Zitnick,et al. Bringing Semantics into Focus Using Visual Abstraction , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.
[61] Jeffrey Dean,et al. Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.
[62] Xinlei Chen,et al. NEIL: Extracting Visual Knowledge from Web Data , 2013, 2013 IEEE International Conference on Computer Vision.
[63] Chen Xu,et al. The SUN Attribute Database: Beyond Categories for Deeper Scene Understanding , 2014, International Journal of Computer Vision.
[64] Peter Young,et al. Framing Image Description as a Ranking Task: Data, Models and Evaluation Metrics , 2013, J. Artif. Intell. Res..
[65] Silvio Savarese,et al. Understanding Indoor Scenes Using 3D Geometric Phrases , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.
[66] Alon Lavie,et al. Meteor Universal: Language Specific Translation Evaluation for Any Target Language , 2014, WMT@ACL.
[67] Trevor Darrell,et al. Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.
[68] Mihai Surdeanu,et al. The Stanford CoreNLP Natural Language Processing Toolkit , 2014, ACL.
[69] Peter Young,et al. From image descriptions to visual denotations: New similarity metrics for semantic inference over event descriptions , 2014, TACL.
[70] R. Fergus,et al. OverFeat: Integrated Recognition, Localization and Detection using Convolutional Networks , 2013, ICLR.
[71] Pietro Perona,et al. Microsoft COCO: Common Objects in Context , 2014, ECCV.
[72] Jun Zhao,et al. Relation Classification via Convolutional Deep Neural Network , 2014, COLING.
[73] Wei Xu,et al. Explain Images with Multimodal Recurrent Neural Networks , 2014, ArXiv.
[74] Angel X. Chang,et al. Semantic Parsing for Text to 3D Scene Generation , 2014, ACL 2014.
[75] Mario Fritz,et al. A Multi-World Approach to Question Answering about Real-World Scenes based on Uncertain Input , 2014, NIPS.
[76] Zhiyuan Liu,et al. A Unified Model for Word Sense Representation and Disambiguation , 2014, EMNLP.
[77] C. Lawrence Zitnick,et al. Zero-Shot Learning via Visual Abstraction , 2014, ECCV.
[78] Li Fei-Fei,et al. Reasoning about Object Affordances in a Knowledge Base Representation , 2014, ECCV.
[79] Ali Farhadi,et al. Incorporating Scene Context and Object Layout into Appearance Modeling , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.
[80] Ruslan Salakhutdinov,et al. Multimodal Neural Language Models , 2014, ICML.
[81] Joachim Denzler,et al. Nonparametric Part Transfer for Fine-Grained Recognition , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.
[82] Michael S. Bernstein,et al. We Are Dynamo: Overcoming Stalling and Friction in Collective Action for Crowd Workers , 2015, CHI.
[83] Donald Geman,et al. Visual Turing test for computer vision systems , 2015, Proceedings of the National Academy of Sciences.
[84] C. Lawrence Zitnick,et al. Learning Common Sense through Visual Abstraction , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).
[85] Ronan Collobert,et al. Phrase-based Image Captioning , 2015, ICML.
[86] Li Fei-Fei,et al. Building a Large-scale Multimodal Knowledge Base System for Answering Visual Queries , 2015 .
[87] Geoffrey Zweig,et al. From captions to visual concepts and back , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[88] Harm de Vries,et al. RMSProp and equilibrated adaptive learning rates for non-convex optimization. , 2015 .
[89] Yoshua Bengio,et al. Equilibrated adaptive learning rates for non-convex optimization , 2015, NIPS.
[90] C. Lawrence Zitnick,et al. CIDEr: Consensus-based image description evaluation , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[91] Li Fei-Fei,et al. Generating Semantically Precise Scene Graphs from Textual Descriptions for Improved Image Retrieval , 2015, VL@EMNLP.
[92] B. Scholl,et al. Cognition does not affect perception: Evaluating the evidence for “top-down” effects , 2015, Behavioral and Brain Sciences.
[93] Wei Xu,et al. Are You Talking to a Machine? Dataset and Methods for Multilingual Image Question , 2015, NIPS.
[94] Pietro Perona,et al. Describing Common Human Visual Actions in Images , 2015, BMVC.
[95] Kaiming He,et al. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[96] Hinrich Schütze,et al. AutoExtend: Extending Word Embeddings to Embeddings for Synsets and Lexemes , 2015, ACL.
[97] Ali Farhadi,et al. VisKE: Visual knowledge extraction and question answering by visual verification of relation phrases , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[98] Yoshua Bengio,et al. Show, Attend and Tell: Neural Image Caption Generation with Visual Attention , 2015, ICML.
[99] Richard S. Zemel,et al. Exploring Models and Data for Image Question Answering , 2015, NIPS.
[100] Xinlei Chen,et al. Microsoft COCO Captions: Data Collection and Evaluation Server , 2015, ArXiv.
[101] Ross B. Girshick,et al. Fast R-CNN , 2015, 1504.08083.
[102] Michael S. Bernstein,et al. Image retrieval using scene graphs , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[103] Margaret Mitchell,et al. VQA: Visual Question Answering , 2015, International Journal of Computer Vision.
[104] Samy Bengio,et al. Learning semantic relationships for better action retrieval in images , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[105] Licheng Yu,et al. Visual Madlibs: Fill in the blank Image Generation and Question Answering , 2015, ArXiv.
[106] Xinlei Chen,et al. Mind's eye: A recurrent visual representation for image caption generation , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[107] Edward H. Adelson,et al. Discovering states and transformations in image collections , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[108] Mario Fritz,et al. Ask Your Neurons: A Neural-Based Approach to Answering Questions about Images , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).
[109] Samy Bengio,et al. Show and tell: A neural image caption generator , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[110] Richard S. Zemel,et al. Image Question Answering: A Visual Semantic Embedding Model and a New Dataset , 2015, ArXiv.
[111] Dumitru Erhan,et al. Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[112] Michael S. Bernstein,et al. ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.
[113] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.
[114] Trevor Darrell,et al. Long-term recurrent convolutional networks for visual recognition and description , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[115] David A. Shamma,et al. YFCC100M , 2015, Commun. ACM.
[116] Michael S. Bernstein,et al. Visual Relationship Detection with Language Priors , 2016, ECCV.
[117] Lin Ma,et al. Learning to Answer Questions from Image Using Convolutional Neural Network , 2015, AAAI.
[118] Michael S. Bernstein,et al. Embracing Error to Enable Rapid Crowdsourcing , 2016, CHI.
[119] Fei-Fei Li,et al. Deep visual-semantic alignments for generating image descriptions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).