Query - Dependent Video Representation for Moment Retrieval and Highlight Detection
暂无分享,去创建一个
[1] Junho Park,et al. Difficulty-Aware Simulator for Open Set Recognition , 2022, ECCV.
[2] Yang Wang,et al. Contrastive Learning for Unsupervised Video Highlight Detection , 2022, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[3] Yuning Jiang,et al. Supplementary for Paper: Learning Pixel-Level Distinctions for Video Highlight Detection , 2022 .
[4] C. Schmid,et al. TubeDETR: Spatio-Temporal Video Grounding with Transformers , 2022, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[5] Ying Shan,et al. UMT: Unified Multi-modal Transformers for Joint Video Moment Retrieval and Highlight Detection , 2022, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[6] Winston H. Hsu,et al. MonoDTR: Monocular 3D Object Detection with Depth-Aware Transformer , 2022, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[7] L. Ni,et al. DN-DETR: Accelerate DETR Training by Introducing Query DeNoising , 2022, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[8] Hang Su,et al. DAB-DETR: Dynamic Anchor Boxes are Better Queries for DETR , 2022, ICLR.
[9] Thomas Brox,et al. Ranking Info Noise Contrastive Estimation: Boosting Contrastive Learning via Ranked Positives , 2022, AAAI.
[10] A. Schwing,et al. Masked-attention Mask Transformer for Universal Image Segmentation , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[11] Yitian Yuan,et al. Semantic Conditioned Dynamic Modulation for Temporal Sentence Grounding in Videos , 2019, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[12] Changsheng Xu,et al. Fast Video Moment Retrieval , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).
[13] Lu Yuan,et al. Dynamic DETR: End-to-End Object Detection with Dynamic Attention , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).
[14] Zirui Wang,et al. Temporal Cue Guided Video Highlight Detection with Low-Rank Audio-Visual Fusion , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).
[15] Yang Wang,et al. Joint Visual and Audio Learning for Video Highlight Detection , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).
[16] Bingbing Ni,et al. Cross-category Video Highlight Detection via Set-based Learning , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).
[17] Depu Meng,et al. Conditional DETR for Fast Training Convergence , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).
[18] Tamara L. Berg,et al. QVHighlights: Detecting Moments and Highlights in Videos via Natural Language Queries , 2021, ArXiv.
[19] Alexander G. Schwing,et al. Per-Pixel Classification is Not All You Need for Semantic Segmentation , 2021, NeurIPS.
[20] Yann LeCun,et al. MDETR - Modulated Detection for End-to-End Multi-Modal Understanding , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).
[21] Ilya Sutskever,et al. Learning Transferable Visual Models From Natural Language Supervision , 2021, ICML.
[22] Ioannis Patras,et al. Video Summarization Using Deep Neural Networks: A Survey , 2021, Proceedings of the IEEE.
[23] Matthieu Cord,et al. Training data-efficient image transformers & distillation through attention , 2020, ICML.
[24] S. Gelly,et al. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale , 2020, ICLR.
[25] Bin Li,et al. Deformable DETR: Deformable Transformers for End-to-End Object Detection , 2020, ICLR.
[26] Dimitris N. Metaxas,et al. Learning Trailer Moments in Full-Length Movies with Co-Contrastive Attention , 2020, ECCV.
[27] Weishi Zheng,et al. MINI-Net: Multiple Instance Ranking Network for Video Highlight Detection , 2020, ECCV.
[28] Nicolas Usunier,et al. End-to-End Object Detection with Transformers , 2020, ECCV.
[29] Hao Zhang,et al. Span-based Localizing Network for Natural Language Video Localization , 2020, ACL.
[30] Junnan Li,et al. DivideMix: Learning with Noisy Labels as Semi-supervised Learning , 2020, ICLR.
[31] Mohit Bansal,et al. TVR: A Large-Scale Dataset for Video-Subtitle Moment Retrieval , 2020, ECCV.
[32] Mark D. Plumbley,et al. PANNs: Large-Scale Pretrained Audio Neural Networks for Audio Pattern Recognition , 2019, IEEE/ACM Transactions on Audio, Speech, and Language Processing.
[33] Jiebo Luo,et al. Learning 2D Temporal Adjacent Networks for Moment Localization with Natural Language , 2019, AAAI.
[34] Wenhao Jiang,et al. Temporally Grounding Language Queries in Videos by Contextual Boundary-aware Prediction , 2019, AAAI.
[35] Ross B. Girshick,et al. Focal Loss for Dense Object Detection , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[36] Long Chen,et al. DEBUG: A Dense Bottom-Up Grounding Approach for Natural Language Video Localization , 2019, EMNLP.
[37] Bernard Ghanem,et al. Temporal Localization of Moments in Video Collections with Natural Language , 2019, ArXiv.
[38] Yu-Gang Jiang,et al. Semantic Proposal for Activity Localization in Videos via Sentence Query , 2019, AAAI.
[39] Liang Wang,et al. Language-Driven Temporal Activity Localization: A Semantic Matching Reinforcement Learning Model , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[40] Yannis Kalantidis,et al. Less Is More: Learning Highlight Detection From Video Duration , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[41] Silvio Savarese,et al. Generalized Intersection Over Union: A Metric and a Loss for Bounding Box Regression , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[42] Xiao Liu,et al. Read, Watch, and Move: Reinforcement Learning for Temporally Grounding Natural Language Descriptions in Videos , 2019, AAAI.
[43] Jitendra Malik,et al. SlowFast Networks for Video Recognition , 2018, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).
[44] Larry S. Davis,et al. MAN: Moment Alignment Network for Natural Language Moment Retrieval via Iterative Graph Adjustment , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[45] Larry S. Davis,et al. Weakly-Supervised Video Summarization Using Variational Encoder-Decoder and Web Prior , 2018, ECCV.
[46] Meng Liu,et al. Attentive Moment Retrieval in Videos , 2018, SIGIR.
[47] Yang Wang,et al. Video Summarization Using Fully Convolutional Sequence Networks , 2018, ECCV.
[48] Ting Yao,et al. Deep Learning for Video Classification and Captioning , 2016, Frontiers of Multimedia Research.
[49] Amit K. Roy-Chowdhury,et al. Weakly Supervised Summarization of Web Videos , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).
[50] Kaiming He,et al. Focal Loss for Dense Object Detection , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).
[51] Trevor Darrell,et al. Localizing Moments in Video with Natural Language , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).
[52] Michael Lam,et al. Unsupervised Video Summarization with Adversarial LSTM Networks , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[53] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[54] Andrew Zisserman,et al. Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[55] Fabio Viola,et al. The Kinetics Human Action Video Dataset , 2017, ArXiv.
[56] Ramakant Nevatia,et al. TALL: Temporal Activity Localization via Language Query , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).
[57] Aren Jansen,et al. Audio Set: An ontology and human-labeled dataset for audio events , 2017, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[58] Yale Song,et al. To Click or Not To Click: Automatic Selection of Beautiful Thumbnails from Videos , 2016, CIKM.
[59] Tao Mei,et al. Highlight Detection with Pairwise Deep Ranking for First-Person Video Summarization , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[60] Ke Zhang,et al. Video Summarization with Long Short-Term Memory , 2016, ECCV.
[61] Yale Song,et al. Video2GIF: Automatic Generation of Animated GIFs from Video , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[62] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[63] Yale Song,et al. TVSum: Summarizing web videos using titles , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[64] Yongdong Zhang,et al. Multi-task deep visual-semantic embedding for video thumbnail selection , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[65] Lorenzo Torresani,et al. Learning Spatiotemporal Features with 3D Convolutional Networks , 2014, 2015 IEEE International Conference on Computer Vision (ICCV).
[66] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.
[67] Jeffrey Pennington,et al. GloVe: Global Vectors for Word Representation , 2014, EMNLP.
[68] Ali Farhadi,et al. Ranking Domain-Specific Highlights by Analyzing Edited Videos , 2014, ECCV.
[69] Chih-Jen Lin,et al. Large-Scale Video Summarization Using Web-Image Priors , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.