Discriminative Bimodal Networks for Visual Localization and Detection with Natural Language Queries
暂无分享,去创建一个
Yuting Zhang | Honglak Lee | Zhiyuan He | Luyao Yuan | Yijie Guo | I-An Huang | Honglak Lee | Y. Zhang | Yijie Guo | Zhiyuan He | Luyao Yuan | I-An Huang
[1] Jitendra Malik,et al. Region-Based Convolutional Networks for Accurate Object Detection and Segmentation , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[2] Alon Lavie,et al. METEOR: An Automatic Metric for MT Evaluation with Improved Correlation with Human Judgments , 2005, IEEvaluation@ACL.
[3] Dan Klein,et al. Neural Module Networks , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[4] Mario Fritz,et al. Ask Your Neurons: A Neural-Based Approach to Answering Questions about Images , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).
[5] Vicente Ordonez,et al. ReferItGame: Referring to Objects in Photographs of Natural Scenes , 2014, EMNLP.
[6] Bohyung Han,et al. Image Question Answering Using Convolutional Neural Network with Dynamic Parameter Prediction , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[7] Fei-Fei Li,et al. ImageNet: A large-scale hierarchical image database , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.
[8] Pietro Perona,et al. Microsoft COCO: Common Objects in Context , 2014, ECCV.
[9] Wei Xu,et al. Deep Captioning with Multimodal Recurrent Neural Networks (m-RNN) , 2014, ICLR.
[10] Trevor Darrell,et al. Deep Compositional Captioning: Describing Novel Object Categories without Paired Training Data , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[11] Jian Sun,et al. Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[12] Sanja Fidler,et al. Skip-Thought Vectors , 2015, NIPS.
[13] Phil Blunsom,et al. A Convolutional Neural Network for Modelling Sentences , 2014, ACL.
[14] Xiang Zhang,et al. OverFeat: Integrated Recognition, Localization and Detection using Convolutional Networks , 2013, ICLR.
[15] Geoffrey E. Hinton,et al. Generating Text with Recurrent Neural Networks , 2011, ICML.
[16] Ali Farhadi,et al. You Only Look Once: Unified, Real-Time Object Detection , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[17] Bernt Schiele,et al. Generative Adversarial Text to Image Synthesis , 2016, ICML.
[18] Michael S. Bernstein,et al. Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations , 2016, International Journal of Computer Vision.
[19] Koen E. A. van de Sande,et al. Selective Search for Object Recognition , 2013, International Journal of Computer Vision.
[20] Alexander J. Smola,et al. Stacked Attention Networks for Image Question Answering , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[21] Xinlei Chen,et al. Mind's eye: A recurrent visual representation for image caption generation , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[22] Alan L. Yuille,et al. Generation and Comprehension of Unambiguous Object Descriptions , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[23] Dumitru Erhan,et al. Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[24] Yuting Zhang,et al. Improving object detection with deep convolutional networks via Bayesian optimization and structured prediction , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[25] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[26] Bill Triggs,et al. Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).
[27] Trevor Darrell,et al. Segmentation from Natural Language Expressions , 2016, ECCV.
[28] Alex Graves,et al. Generating Sequences With Recurrent Neural Networks , 2013, ArXiv.
[29] Sergey Ioffe,et al. Rethinking the Inception Architecture for Computer Vision , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[30] Lukás Burget,et al. Recurrent neural network based language model , 2010, INTERSPEECH.
[31] Margaret Mitchell,et al. VQA: Visual Question Answering , 2015, International Journal of Computer Vision.
[32] Yi Li,et al. R-FCN: Object Detection via Region-based Fully Convolutional Networks , 2016, NIPS.
[33] Xiaogang Wang,et al. DeepID-Net: Deformable deep convolutional neural networks for object detection , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[34] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.
[35] Andrew L. Maas. Rectifier Nonlinearities Improve Neural Network Acoustic Models , 2013 .
[36] Xiang Zhang,et al. Character-level Convolutional Networks for Text Classification , 2015, NIPS.
[37] Samy Bengio,et al. Show and tell: A neural image caption generator , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[38] Ross B. Girshick,et al. Fast R-CNN , 2015, 1504.08083.
[39] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.
[40] Andreas Geiger,et al. Are we ready for autonomous driving? The KITTI vision benchmark suite , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.
[41] Trevor Darrell,et al. Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.
[42] Trevor Darrell,et al. Long-term recurrent convolutional networks for visual recognition and description , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[43] Yoshua Bengio,et al. Show, Attend and Tell: Neural Image Caption Generation with Visual Attention , 2015, ICML.
[44] Kaiming He,et al. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[45] Licheng Yu,et al. Modeling Context in Referring Expressions , 2016, ECCV.
[46] Rob Fergus,et al. Visualizing and Understanding Convolutional Networks , 2013, ECCV.
[47] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[48] Bernt Schiele,et al. Learning Deep Representations of Fine-Grained Visual Descriptions , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[49] Luc Van Gool,et al. The Pascal Visual Object Classes (VOC) Challenge , 2010, International Journal of Computer Vision.
[50] David A. McAllester,et al. Object Detection with Discriminatively Trained Part Based Models , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[51] Larry S. Davis,et al. Modeling Context Between Objects for Referring Expression Understanding , 2016, ECCV.
[52] Trevor Darrell,et al. Grounding of Textual Phrases in Images by Reconstruction , 2015, ECCV.
[53] Yoon Kim,et al. Convolutional Neural Networks for Sentence Classification , 2014, EMNLP.
[54] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.
[55] C. Lawrence Zitnick,et al. Edge Boxes: Locating Object Proposals from Edges , 2014, ECCV.
[56] Trevor Darrell,et al. Natural Language Object Retrieval , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[57] Hao Zhou,et al. Visual Relationship Detection with Relative Location Mining , 2019, ACM Multimedia.
[58] Li Fei-Fei,et al. DenseCap: Fully Convolutional Localization Networks for Dense Captioning , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[59] Fei-Fei Li,et al. Deep visual-semantic alignments for generating image descriptions , 2015, CVPR.
[60] Lawrence D. Jackel,et al. Backpropagation Applied to Handwritten Zip Code Recognition , 1989, Neural Computation.