Scene Graph Inference via Multi-Scale Context Modeling
暂无分享,去创建一个
Mohan S. Kankanhalli | Yongkang Wong | Yuting Su | Anan Liu | Ning Xu | Weizhi Nie | Anan Liu | Yuting Su | Weizhi Nie | M. Kankanhalli | Yongkang Wong | Ning Xu
[1] Svetlana Lazebnik,et al. Flickr30k Entities: Collecting Region-to-Phrase Correspondences for Richer Image-to-Sentence Models , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).
[2] Mohan S. Kankanhalli,et al. Hierarchical Clustering Multi-Task Learning for Joint Human Action Grouping and Recognition , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[3] Zhou Yu,et al. Beyond Bilinear: Generalized Multimodal Factorized High-Order Pooling for Visual Question Answering , 2017, IEEE Transactions on Neural Networks and Learning Systems.
[4] Serge J. Belongie,et al. Context based object categorization: A critical survey , 2010, Comput. Vis. Image Underst..
[5] Yejin Choi,et al. Baby talk: Understanding and generating simple image descriptions , 2011, CVPR 2011.
[6] Li Fei-Fei,et al. Knowledge Acquisition for Visual Question Answering via Iterative Querying , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[7] Ross B. Girshick,et al. Mask R-CNN , 2017, 1703.06870.
[8] Jitendra Malik,et al. Contextual Action Recognition with R*CNN , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).
[9] Basura Fernando,et al. SPICE: Semantic Propositional Image Caption Evaluation , 2016, ECCV.
[10] Stefan Lee,et al. Graph R-CNN for Scene Graph Generation , 2018, ECCV.
[11] Yongdong Zhang,et al. STAT: Spatial-Temporal Attention Mechanism for Video Captioning , 2020, IEEE Transactions on Multimedia.
[12] Ian D. Reid,et al. Towards Context-Aware Interaction Recognition for Visual Relationship Detection , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).
[13] Zhou Yu,et al. SPRNet: Single-Pixel Reconstruction for One-Stage Instance Segmentation , 2019, IEEE Transactions on Cybernetics.
[14] Qionghai Dai,et al. Cross-Modality Bridging and Knowledge Transferring for Image Understanding , 2019, IEEE Transactions on Multimedia.
[15] Yongdong Zhang,et al. Multi-Level Policy and Reward Reinforcement Learning for Image Captioning , 2018, IJCAI.
[16] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[17] Luming Zhang,et al. Multiview and Multimodal Pervasive Indoor Localization , 2017, ACM Multimedia.
[18] Trevor Darrell,et al. Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.
[19] Weisi Lin,et al. Learning Markov Clustering Networks for Scene Text Detection , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[20] Danfei Xu,et al. Scene Graph Generation by Iterative Message Passing , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[21] Frank Keller,et al. Image Description using Visual Dependency Representations , 2013, EMNLP.
[22] Xiaogang Wang,et al. ViP-CNN: Visual Phrase Guided Convolutional Neural Network , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[23] Mohan S. Kankanhalli,et al. Interact as You Intend: Intention-Driven Human-Object Interaction Detection , 2018, IEEE Transactions on Multimedia.
[24] Kaiming He,et al. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[25] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.
[26] Jun Yu,et al. Multimodal Face-Pose Estimation With Multitask Manifold Deep Learning , 2019, IEEE Transactions on Industrial Informatics.
[27] Michael S. Bernstein,et al. Visual Relationship Detection with Language Priors , 2016, ECCV.
[28] Fei-Fei Li,et al. Grouplet: A structured image representation for recognizing human and object interactions , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.
[29] Yuxin Peng,et al. SSDH: Semi-Supervised Deep Hashing for Large Scale Image Retrieval , 2016, IEEE Transactions on Circuits and Systems for Video Technology.
[30] Heng Tao Shen,et al. Exploring Auxiliary Context: Discrete Semantic Transfer Hashing for Scalable Image Retrieval , 2018, IEEE Transactions on Neural Networks and Learning Systems.
[31] Karl Stratos,et al. Understanding and predicting importance in images , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.
[32] Mohan S. Kankanhalli,et al. Learning to Detect Human-Object Interactions With Knowledge , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[33] Bo Dai,et al. Detecting Visual Relationships with Deep Relational Networks , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[34] Shih-Fu Chang,et al. PPR-FCN: Weakly Supervised Visual Relation Detection via Parallel Pairwise R-FCN , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).
[35] Lucy Vanderwende,et al. Learning the Visual Interpretation of Sentences , 2013, 2013 IEEE International Conference on Computer Vision.
[36] Meng Wang,et al. Image-Based Three-Dimensional Human Pose Recovery by Multiview Locality-Sensitive Sparse Retrieval , 2015, IEEE Transactions on Industrial Electronics.
[37] Antonio Torralba,et al. Exploiting hierarchical context on a large database of object categories , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.
[38] Armand Joulin,et al. Deep Fragment Embeddings for Bidirectional Image Sentence Mapping , 2014, NIPS.
[39] Michael Isard,et al. A Multi-View Embedding Space for Modeling Internet Images, Tags, and Their Semantics , 2012, International Journal of Computer Vision.
[40] Ross B. Girshick,et al. Fast R-CNN , 2015, 1504.08083.
[41] Jianfei Cai,et al. Shuffle-Then-Assemble: Learning Object-Agnostic Visual Relationship Features , 2018, ECCV.
[42] Meng Wang,et al. Multimodal Deep Autoencoder for Human Pose Recovery , 2015, IEEE Transactions on Image Processing.
[43] Qingming Huang,et al. Spatial Pyramid-Enhanced NetVLAD With Weighted Triplet Loss for Place Recognition , 2020, IEEE Transactions on Neural Networks and Learning Systems.
[44] Michael S. Bernstein,et al. Image retrieval using scene graphs , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[45] Sanja Fidler,et al. Describing the scene as a whole: Joint object detection, scene classification and semantic segmentation , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.
[46] Xuelong Li,et al. Learning Parts-Based and Global Representation for Image Classification , 2018, IEEE Transactions on Circuits and Systems for Video Technology.
[47] Larry S. Davis,et al. Beyond Nouns: Exploiting Prepositions and Comparative Adjectives for Learning Visual Classifiers , 2008, ECCV.
[48] Jonathan Berant,et al. Learning to generalize to new compositions in image understanding , 2016, ArXiv.
[49] Shih-Fu Chang,et al. Visual Translation Embedding Network for Visual Relation Detection , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[50] Zhou Yu,et al. Multimodal Transformer With Multi-View Visual Representation for Image Captioning , 2019, IEEE Transactions on Circuits and Systems for Video Technology.
[51] Samy Bengio,et al. Learning semantic relationships for better action retrieval in images , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[52] Chao Xu,et al. Robust Visual Tracking via Multi-Scale Spatio-Temporal Context Learning , 2018, IEEE Transactions on Circuits and Systems for Video Technology.
[53] Chenliang Xu,et al. A Thousand Frames in Just a Few Words: Lingual Description of Videos through Latent Topics and Sparse Object Stitching , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.
[54] Weijian Li,et al. Attentive Relational Networks for Mapping Images to Scene Graphs , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[55] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.
[56] Xiaogang Wang,et al. Factorizable Net: An Efficient Subgraph-based Framework for Scene Graph Generation , 2018, ECCV.
[57] Michael S. Bernstein,et al. Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations , 2016, International Journal of Computer Vision.
[58] Yongdong Zhang,et al. Multi-Level Policy and Reward-Based Deep Reinforcement Learning Framework for Image Captioning , 2020, IEEE Transactions on Multimedia.
[59] Luca Antiga,et al. Automatic differentiation in PyTorch , 2017 .
[60] Shi-Min Hu,et al. S4Net: Single stage salient-instance segmentation , 2017, Computational Visual Media.
[61] An-An Liu,et al. 3D Object Retrieval Based on Multi-View Latent Variable Model , 2019, IEEE Transactions on Circuits and Systems for Video Technology.
[62] Juan-Zi Li,et al. Explainable and Explicit Visual Reasoning Over Scene Graphs , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[63] Xiaogang Wang,et al. Scene Graph Generation from Objects, Phrases and Region Captions , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).
[64] Svetlana Lazebnik,et al. Phrase Localization and Visual Relationship Detection with Comprehensive Image-Language Cues , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).
[65] Guosheng Lin,et al. Exploring Context with Deep Structured Models for Semantic Segmentation , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[66] Cees Snoek,et al. COSTA: Co-Occurrence Statistics for Zero-Shot Classification , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.
[67] Yongdong Zhang,et al. Dual-Stream Recurrent Neural Network for Video Captioning , 2019, IEEE Transactions on Circuits and Systems for Video Technology.
[68] Ke Lu,et al. Heterogeneous Domain Adaptation Through Progressive Alignment , 2019, IEEE Transactions on Neural Networks and Learning Systems.
[69] Gang Wang,et al. Context Contrasted Feature and Gated Multi-scale Aggregation for Scene Segmentation , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[70] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.
[71] Ali Farhadi,et al. Recognition using visual phrases , 2011, CVPR 2011.
[72] Ali Farhadi,et al. Incorporating Scene Context and Object Layout into Appearance Modeling , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.
[73] Mohan S. Kankanhalli,et al. Dual-Glance Model for Deciphering Social Relationships , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).
[74] Long Chen,et al. Counterfactual Critic Multi-Agent Training for Scene Graph Generation , 2018, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).
[75] Jianfei Cai,et al. Auto-Encoding Scene Graphs for Image Captioning , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[76] Larry S. Davis,et al. Visual Relationship Detection with Internal and External Linguistic Knowledge Distillation , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).
[77] Jun Yu,et al. Hierarchical Deep Click Feature Prediction for Fine-Grained Image Recognition , 2019, IEEE Transactions on Pattern Analysis and Machine Intelligence.