暂无分享,去创建一个
Zhou Yu | Yijun Song | Lin Ma | Jun Yu | Jingwen Wang | Lin Ma | Zhou Yu | Jingwen Wang | Jun Yu | Yijun Song
[1] Yahong Han,et al. Multi-modal Circulant Fusion for Video-to-Language and Backward , 2018, IJCAI.
[2] Zhijie Lin,et al. Weakly-Supervised Video Moment Retrieval via Semantic Completion Network , 2020, AAAI.
[3] Wenhan Luo,et al. Look Closer to Ground Better: Weakly-Supervised Temporal Grounding of Sentence in Video , 2020, ArXiv.
[4] Yu-Gang Jiang,et al. Semantic Proposal for Activity Localization in Videos via Sentence Query , 2019, AAAI.
[5] Juan Carlos Niebles,et al. Dense-Captioning Events in Videos , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).
[6] Fei-Fei Li,et al. Deep visual-semantic alignments for generating image descriptions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[7] Jeffrey Pennington,et al. GloVe: Global Vectors for Word Representation , 2014, EMNLP.
[8] Ramakant Nevatia,et al. TALL: Temporal Activity Localization via Language Query , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).
[9] Shilei Wen,et al. BMN: Boundary-Matching Network for Temporal Action Proposal Generation , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).
[10] Tao Mei,et al. To Find Where You Talk: Temporal Sentence Localization in Video with Attention Based Location Regression , 2018, AAAI.
[11] Liang Wang,et al. Language-Driven Temporal Activity Localization: A Semantic Matching Reinforcement Learning Model , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[12] Wei Liu,et al. SSD: Single Shot MultiBox Detector , 2015, ECCV.
[13] Ramakant Nevatia,et al. MAC: Mining Activity Concepts for Language-Based Temporal Localization , 2018, 2019 IEEE Winter Conference on Applications of Computer Vision (WACV).
[14] Amit K. Roy-Chowdhury,et al. Weakly Supervised Video Moment Retrieval From Text Queries , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[15] Larry S. Davis,et al. WSLLN:Weakly Supervised Natural Language Localization Networks , 2019, EMNLP.
[16] Limin Wang,et al. Temporal Action Detection with Structured Segment Networks , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).
[17] James M. Rehg,et al. Tripping through time: Efficient Localization of Activities in Videos , 2019, BMVC.
[18] Ming Yang,et al. BSN: Boundary Sensitive Network for Temporal Action Proposal Generation , 2018, ECCV.
[19] Kate Saenko,et al. Multilevel Language and Vision Integration for Text-to-Clip Retrieval , 2018, AAAI.
[20] Lin Ma,et al. Temporally Grounding Natural Sentence in Video , 2018, EMNLP.
[21] Xi Chen,et al. Stacked Cross Attention for Image-Text Matching , 2018, ECCV.
[22] Juan Carlos Niebles,et al. Temporal Modular Networks for Retrieving Complex Compositional Activities in Videos , 2018, ECCV.
[23] Meng Liu,et al. Attentive Moment Retrieval in Videos , 2018, SIGIR.
[24] Jingwen Wang,et al. Temporally Grounding Language Queries in Videos by Contextual Boundary-aware Prediction , 2020, AAAI.
[25] Trevor Darrell,et al. Localizing Moments in Video with Natural Language , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).
[26] Yitian Yuan,et al. Semantic Conditioned Dynamic Modulation for Temporal Sentence Grounding in Videos , 2019, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[27] Lorenzo Torresani,et al. Learning Spatiotemporal Features with 3D Convolutional Networks , 2014, 2015 IEEE International Conference on Computer Vision (ICCV).
[28] Sanja Fidler,et al. Skip-Thought Vectors , 2015, NIPS.
[29] Bernard Ghanem,et al. SST: Single-Stream Temporal Action Proposals , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[30] Yoshua Bengio,et al. Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.