论文信息 - GEB+: A Benchmark for Generic Event Boundary Captioning, Grounding and Retrieval - 字舞流文

GEB+: A Benchmark for Generic Event Boundary Captioning, Grounding and Retrieval

Mike Zheng Shou | Stan Weixian Lei | Matt Feiszli | Yuxuan Wang | Difei Gao | Licheng Yu | Mike Zheng Shou

[1] Mike Zheng Shou,et al. On Pursuit of Designing Multi-modal Transformer for Video Grounding , 2021, EMNLP.

[2] Fan Yang,et al. Improving Video-Text Retrieval by Multi-Stream Corpus Alignment and Dual Softmax Loss , 2021, ArXiv.

[3] Ping Luo,et al. End-to-End Dense Video Captioning with Parallel Decoding , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[4] Andrew Zisserman,et al. Frozen in Time: A Joint Video and Image Encoder for End-to-End Retrieval , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[5] Jes'us Andr'es Portillo-Quintero,et al. A Straightforward Framework For Video Retrieval Using CLIP , 2021, MCPR.

[6] Weiyao Wang,et al. Generic Event Boundary Detection: A Benchmark for Event Segmentation , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[7] Yi Yang,et al. ActBERT: Learning Global-Local Video-Text Representations , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[8] Bohyung Han,et al. Local-Global Video-Text Interactions for Temporal Grounding , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[9] Runhao Zeng,et al. Dense Regression Network for Video Grounding , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[10] Esa Rahtu,et al. Multi-modal Dense Video Captioning , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[11] Xilin Chen,et al. UniViLM: A Unified Video and Language Pre-Training Model for Multimodal Understanding and Generation , 2020, ArXiv.

[12] Jiebo Luo,et al. Learning 2D Temporal Adjacent Networks for Moment Localization with Natural Language , 2019, AAAI.

[13] Iryna Gurevych,et al. Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks , 2019, EMNLP.

[14] Xin Wang,et al. VaTeX: A Large-Scale, High-Quality Multilingual Dataset for Video-and-Language Research , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[15] Trevor Darrell,et al. Robust Change Captioning , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[16] Ramakant Nevatia,et al. MAC: Mining Activity Concepts for Language-Based Temporal Localization , 2018, 2019 IEEE Winter Conference on Applications of Computer Vision (WACV).

[17] Licheng Yu,et al. TVQA: Localized, Compositional Video Question Answering , 2018, EMNLP.

[18] Harsh Jhamtani,et al. Learning to Describe Differences Between Pairs of Similar Images , 2018, EMNLP.

[19] Tao Mei,et al. Jointly Localizing and Describing Events for Dense Video Captioning , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[20] Tao Mei,et al. To Find Where You Talk: Temporal Sentence Localization in Video with Attention Based Location Regression , 2018, AAAI.

[21] Gang Li,et al. Change Detection in Heterogenous Remote Sensing Images via Homogeneous Pixel Transformation , 2018, IEEE Transactions on Image Processing.

[22] Andrew Zisserman,et al. Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[23] Fabio Viola,et al. The Kinetics Human Action Video Dataset , 2017, ArXiv.

[24] Matthieu Cord,et al. MUTAN: Multimodal Tucker Fusion for Visual Question Answering , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[25] Limin Wang,et al. Temporal Segment Networks for Action Recognition in Videos , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[26] Ramakant Nevatia,et al. TALL: Temporal Activity Localization via Language Query , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[27] Juan Carlos Niebles,et al. Dense-Captioning Events in Videos , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[28] Chenliang Xu,et al. Towards Automatic Learning of Procedures From Web Instructional Videos , 2017, AAAI.

[29] Stephen Gould,et al. SPICE: Semantic Propositional Image Caption Evaluation , 2016, ECCV.

[30] Germán Ros,et al. Street-view change detection with deconvolutional networks , 2016, Autonomous Robots.

[31] Tao Mei,et al. MSR-VTT: A Large Video Description Dataset for Bridging Video and Language , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[32] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[33] Kaiming He,et al. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[34] Terrance E. Boult,et al. Towards Open World Recognition , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[35] Lorenzo Torresani,et al. Learning Spatiotemporal Features with 3D Convolutional Networks , 2014, 2015 IEEE International Conference on Computer Vision (ICCV).

[36] C. Lawrence Zitnick,et al. CIDEr: Consensus-based image description evaluation , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[37] Bernt Schiele,et al. Grounding Action Descriptions in Videos , 2013, TACL.

[38] Jeffrey M. Zacks,et al. Event perception , 2011, Scholarpedia.

[39] David L. Chen,et al. Collecting Highly Parallel Data for Paraphrase Evaluation , 2011, ACL.

[40] Chin-Yew Lin,et al. ROUGE: A Package for Automatic Evaluation of Summaries , 2004, ACL 2004.

[41] Salim Roukos,et al. Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[42] Tony Lindeberg,et al. Feature Detection with Automatic Scale Selection , 1998, International Journal of Computer Vision.

[43] Shiyong Cui,et al. Building Change Detection Based on Satellite Stereo Imagery and Digital Surface Models , 2014, IEEE Transactions on Geoscience and Remote Sensing.