Multi-Modal Interaction Graph Convolutional Network for Temporal Language Localization in Videos
暂无分享,去创建一个
Liqiang Nie | Xianjing Han | Xuemeng Song | Yan Yan | Zongmeng Zhang | Liqiang Nie | Xuemeng Song | Zongmeng Zhang | Xianjing Han | Yan Yan | Zongmeng Zhang
[1] Gerard Salton,et al. Research and Development in Information Retrieval , 1982, Lecture Notes in Computer Science.
[2] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.
[3] Ah Chung Tsoi,et al. The Graph Neural Network Model , 2009, IEEE Transactions on Neural Networks.
[4] Ying Wu,et al. Discriminative subvolume search for efficient action detection , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.
[5] Maja Pantic,et al. Spatiotemporal Localization and Categorization of Human Actions in Unsegmented Image Sequences , 2011, IEEE Transactions on Image Processing.
[6] Bernt Schiele,et al. Grounding Action Descriptions in Videos , 2013, TACL.
[7] Mihai Surdeanu,et al. The Stanford CoreNLP Natural Language Processing Toolkit , 2014, ACL.
[8] Yoshua Bengio,et al. Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling , 2014, ArXiv.
[9] Jeffrey Pennington,et al. GloVe: Global Vectors for Word Representation , 2014, EMNLP.
[10] Bernard Ghanem,et al. ActivityNet: A large-scale video benchmark for human activity understanding , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[11] Yoshua Bengio,et al. Show, Attend and Tell: Neural Image Caption Generation with Visual Attention , 2015, ICML.
[12] Ross B. Girshick,et al. Fast R-CNN , 2015, 1504.08083.
[13] Lorenzo Torresani,et al. Learning Spatiotemporal Features with 3D Convolutional Networks , 2014, 2015 IEEE International Conference on Computer Vision (ICCV).
[14] Ali Farhadi,et al. Hollywood in Homes: Crowdsourcing Data Collection for Activity Understanding , 2016, ECCV.
[15] Ming Shao,et al. A Multi-stream Bi-directional Recurrent Neural Network for Fine-Grained Action Detection , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[16] Stan Sclaroff,et al. Learning Activity Progression in LSTMs for Activity Detection and Early Detection , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[17] Luc Van Gool,et al. Temporal Segment Networks: Towards Good Practices for Deep Action Recognition , 2016, ECCV.
[18] Jiasen Lu,et al. Hierarchical Question-Image Co-Attention for Visual Question Answering , 2016, NIPS.
[19] Max Welling,et al. Semi-Supervised Classification with Graph Convolutional Networks , 2016, ICLR.
[20] Juan Carlos Niebles,et al. Dense-Captioning Events in Videos , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).
[21] Andrew Zisserman,et al. Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[22] Diego Marcheggiani,et al. Encoding Sentences with Graph Convolutional Networks for Semantic Role Labeling , 2017, EMNLP.
[23] Alex Fout,et al. Protein Interface Prediction using Graph Convolutional Networks , 2017, NIPS.
[24] Ramakant Nevatia,et al. TALL: Temporal Activity Localization via Language Query , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).
[25] Trevor Darrell,et al. Localizing Moments in Video with Natural Language , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).
[26] Qi Tian,et al. Cross-modal Moment Localization in Videos , 2018, ACM Multimedia.
[27] Lin Ma,et al. Temporally Grounding Natural Sentence in Video , 2018, EMNLP.
[28] Meng Liu,et al. Attentive Moment Retrieval in Videos , 2018, SIGIR.
[29] Ruslan Salakhutdinov,et al. Gated-Attention Architectures for Task-Oriented Language Grounding , 2017, AAAI.
[30] Wenjun Zeng,et al. Spatio-Temporal Attention-Based LSTM Networks for 3D Action Recognition and Detection , 2018, IEEE Transactions on Image Processing.
[31] Jian Yang,et al. Action-Attending Graphic Neural Network , 2017, IEEE Transactions on Image Processing.
[32] Dahua Lin,et al. Spatial Temporal Graph Convolutional Networks for Skeleton-Based Action Recognition , 2018, AAAI.
[33] Leonid Sigal,et al. G3raphGround: Graph-Based Language Grounding , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).
[34] Liang Wang,et al. Language-Driven Temporal Activity Localization: A Semantic Matching Reinforcement Learning Model , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[35] Ramakant Nevatia,et al. MAC: Mining Activity Concepts for Language-Based Temporal Localization , 2018, 2019 IEEE Winter Conference on Applications of Computer Vision (WACV).
[36] Tao Mei,et al. To Find Where You Talk: Temporal Sentence Localization in Video with Attention Based Location Regression , 2018, AAAI.
[37] Lei Chen,et al. Object Grounding via Iterative Context Reasoning , 2019, 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW).
[38] Shiliang Pu,et al. Video Relation Detection with Spatio-Temporal Graph , 2019, ACM Multimedia.
[39] Bin Jiang,et al. Cross-Modal Video Moment Retrieval with Spatial and Language-Temporal Attention , 2019, ICMR.
[40] Yuan Luo,et al. Graph Convolutional Networks for Text Classification , 2018, AAAI.
[41] Larry S. Davis,et al. MAN: Moment Alignment Network for Natural Language Moment Retrieval via Iterative Graph Adjustment , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[42] Xiao Liu,et al. Read, Watch, and Move: Reinforcement Learning for Temporally Grounding Natural Language Descriptions in Videos , 2019, AAAI.
[43] Kate Saenko,et al. Multilevel Language and Vision Integration for Text-to-Clip Retrieval , 2018, AAAI.
[44] Yu-Gang Jiang,et al. Semantic Proposal for Activity Localization in Videos via Sentence Query , 2019, AAAI.
[45] Amit K. Roy-Chowdhury,et al. Weakly Supervised Video Moment Retrieval From Text Queries , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[46] Yu Cheng,et al. Relation-Aware Graph Attention Network for Visual Question Answering , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).
[47] Jiebo Luo,et al. Localizing Natural Language in Videos , 2019, AAAI.
[48] Alexander G. Hauptmann,et al. ExCL: Extractive Clip Localization Using Natural Language Descriptions , 2019, NAACL.
[49] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.
[50] Zhou Zhao,et al. Cross-Modal Interaction Networks for Query-Based Moment Retrieval in Videos , 2019, SIGIR.
[51] Runhao Zeng,et al. Graph Convolutional Networks for Temporal Action Localization , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).
[52] Runhao Zeng,et al. Breaking Winner-Takes-All: Iterative-Winners-Out Networks for Weakly Supervised Temporal Action Localization , 2019, IEEE Transactions on Image Processing.
[53] Le Yang,et al. Revisiting Anchor Mechanisms for Temporal Action Localization , 2020, IEEE Transactions on Image Processing.
[54] James M. Rehg,et al. Tripping through time: Efficient Localization of Activities in Videos , 2019, BMVC.
[55] Furu Wei,et al. VL-BERT: Pre-training of Generic Visual-Linguistic Representations , 2019, ICLR.
[56] Ali K. Thabet,et al. G-TAD: Sub-Graph Localization for Temporal Action Detection , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[57] Xinhang Song,et al. Scene Recognition With Prototype-Agnostic Scene Layout , 2019, IEEE Transactions on Image Processing.
[58] Runhao Zeng,et al. Dense Regression Network for Video Grounding , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[59] Bo Wang,et al. Learning Long-Term Structural Dependencies for Video Salient Object Detection , 2020, IEEE Transactions on Image Processing.
[60] Yiannis Andreopoulos,et al. Graph-based Spatial-temporal Feature Learning for Neuromorphic Vision Sensing , 2019, ArXiv.
[61] Qinghua Zheng,et al. Semantics-Preserving Graph Propagation for Zero-Shot Object Detection , 2020, IEEE Transactions on Image Processing.
[62] Temporally Grounding Language Queries in Videos by Contextual Boundary-aware Prediction , 2019, AAAI.
[63] Qiong Liu,et al. MV-GNN: Multi-View Graph Neural Network for Compression Artifacts Reduction , 2020, IEEE Transactions on Image Processing.
[64] Jiebo Luo,et al. Learning 2D Temporal Adjacent Networks for Moment Localization with Natural Language , 2019, AAAI.
[65] Guanbin Li,et al. Tree-Structured Policy based Progressive Reinforcement Learning for Temporally Language Grounding in Video , 2020, AAAI.
[66] Heng Tao Shen,et al. Temporal Reasoning Graph for Activity Recognition , 2019, IEEE Transactions on Image Processing.