Video Content Understanding Using Text
暂无分享,去创建一个
[1] Andrea Vedaldi,et al. Vlfeat: an open and portable library of computer vision algorithms , 2010, ACM Multimedia.
[2] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.
[3] Jorma Laaksonen,et al. PicSOM Experiments in TRECVID 2018 , 2015, TRECVID.
[4] Wei Ping,et al. Marginal Structured SVM with Hidden Variables , 2014, ICML.
[5] Mario Fritz,et al. A Multi-World Approach to Question Answering about Real-World Scenes based on Uncertain Input , 2014, NIPS.
[6] Larry S. Davis,et al. Image ranking and retrieval based on multi-attribute queries , 2011, CVPR 2011.
[7] Alexei A. Efros,et al. Everybody Dance Now , 2018, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).
[8] Dong Xu,et al. Columbia University TRECVID-2006 Video Search and High-Level Feature Extraction , 2006, TRECVID.
[9] Trevor Darrell,et al. Detection bank: an object detection based video representation for multimedia event recognition , 2012, ACM Multimedia.
[10] Harriet J. Nock,et al. Discriminative model fusion for semantic concept detection and annotation in video , 2003, ACM Multimedia.
[11] Georges Quénot,et al. TRECVID 2015 - An Overview of the Goals, Tasks, Data, Evaluation Mechanisms and Metrics , 2011, TRECVID.
[12] Thomas S. Huang,et al. Generative Image Inpainting with Contextual Attention , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[13] Robert A. Wagner,et al. Order-n correction for regular languages , 1974, CACM.
[14] Marcel Worring,et al. Adding Semantics to Detectors for Video Retrieval , 2007, IEEE Transactions on Multimedia.
[15] Hui Zhang,et al. Kneser-Ney Smoothing on Expected Counts , 2014, ACL.
[16] S. Sathiya Keerthi,et al. Efficient algorithms for ranking with SVMs , 2010, Information Retrieval.
[17] Thorsten Joachims,et al. Learning structural SVMs with latent variables , 2009, ICML '09.
[18] Thorsten Joachims,et al. Optimizing search engines using clickthrough data , 2002, KDD.
[19] Yann Dauphin,et al. Language Modeling with Gated Convolutional Networks , 2016, ICML.
[20] Jongwook Choi,et al. End-to-End Concept Word Detection for Video Captioning, Retrieval, and Question Answering , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[21] Yoshua Bengio,et al. Show, Attend and Tell: Neural Image Caption Generation with Visual Attention , 2015, ICML.
[22] Mario Fritz,et al. Ask Your Neurons: A Neural-Based Approach to Answering Questions about Images , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).
[23] Fabio Viola,et al. The Kinetics Human Action Video Dataset , 2017, ArXiv.
[24] Juha Karhunen,et al. Bidirectional Recurrent Neural Networks as Generative Models , 2015, NIPS.
[25] Jeffrey Dean,et al. Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.
[26] Jeff Donahue,et al. Efficient Video Generation on Complex Datasets , 2019, ArXiv.
[27] Ali Farhadi,et al. Describing objects by their attributes , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.
[28] Haroon Idrees,et al. UCF-CRCV at TRECVID 2015: Semantic Indexing , 2013, TRECVID.
[29] Hugo Larochelle,et al. Modulating early visual processing by language , 2017, NIPS.
[30] Jiebo Luo,et al. Large-scale multimodal semantic concept detection for consumer video , 2007, MIR '07.
[31] F ChenStanley,et al. An Empirical Study of Smoothing Techniques for Language Modeling , 1996, ACL.
[32] Dong Liu,et al. EventNet: A Large Scale Structured Concept Library for Complex Event Detection in Video , 2015, ACM Multimedia.
[33] Marcel Worring,et al. Concept-Based Video Retrieval , 2009, Found. Trends Inf. Retr..
[34] Sepp Hochreiter,et al. GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium , 2017, NIPS.
[35] Lois M. L. Delcambre,et al. Discounted Cumulated Gain Based Evaluation of Multiple-Query IR Sessions , 2008, ECIR.
[36] Tao Mei,et al. To Create What You Tell: Generating Videos from Captions , 2017, ACM Multimedia.
[37] Barbara Caputo,et al. Recognizing human actions: a local SVM approach , 2004, Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004..
[38] Jason J. Corso,et al. Action bank: A high-level representation of activity in video , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.
[39] Nicu Sebe,et al. Animating Arbitrary Objects via Deep Motion Transfer , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[40] Emine Yilmaz,et al. Inferred AP : Estimating Average Precision with Incomplete Judgments , 2006 .
[41] Mubarak Shah,et al. Learning a Multi-concept Video Retrieval Model with Multiple Latent Variables , 2016, 2016 IEEE International Symposium on Multimedia (ISM).
[42] Christopher Joseph Pal,et al. Using Descriptive Video Services to Create a Large Data Source for Video Annotation Research , 2015, ArXiv.
[43] Gabriel Kreiman,et al. Deep Predictive Coding Networks for Video Prediction and Unsupervised Learning , 2016, ICLR.
[44] Li Li,et al. A Survey on Visual Content-Based Video Indexing and Retrieval , 2011, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).
[45] Ladislau Bölöni,et al. Pay Attention! - Robustifying a Deep Visuomotor Policy Through Task-Focused Visual Attention , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[46] Shih-Fu Chang,et al. Query-Adaptive Fusion for Multimodal Search , 2008, Proceedings of the IEEE.
[47] Cordelia Schmid,et al. TagProp: Discriminative metric learning in nearest neighbor models for image auto-annotation , 2009, 2009 IEEE 12th International Conference on Computer Vision.
[48] Andrew Zisserman,et al. Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[49] Alexander H. Waibel,et al. Multimodal error correction for speech user interfaces , 2001, TCHI.
[50] Yann Dauphin,et al. Convolutional Sequence to Sequence Learning , 2017, ICML.
[51] Steven Bird,et al. NLTK: The Natural Language Toolkit , 2002, ACL.
[52] Alex Acero,et al. Whistler: a trainable text-to-speech system , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.
[53] Yitong Li,et al. Video Generation From Text , 2017, AAAI.
[54] Yuichi Yoshida,et al. Spectral Normalization for Generative Adversarial Networks , 2018, ICLR.
[55] Margaret Mitchell,et al. VQA: Visual Question Answering , 2015, International Journal of Computer Vision.
[56] Yash Goyal,et al. Yin and Yang: Balancing and Answering Binary Visual Questions , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[57] Rong Yan,et al. Semantic concept-based query expansion and re-ranking for multimedia retrieval , 2007, ACM Multimedia.
[58] Quoc V. Le,et al. Distributed Representations of Sentences and Documents , 2014, ICML.
[59] Sergey Levine,et al. Unsupervised Learning for Physical Interaction through Video Prediction , 2016, NIPS.
[60] Marcel Worring,et al. Efficient Genre-Specific Semantic Video Indexing , 2012, IEEE Transactions on Multimedia.
[61] Lorenzo Torresani,et al. Learning Spatiotemporal Features with 3D Convolutional Networks , 2014, 2015 IEEE International Conference on Computer Vision (ICCV).
[62] Samy Bengio,et al. A Discriminative Kernel-Based Approach to Rank Images from Text Queries , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[63] Dumitru Erhan,et al. Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[64] Paul Over,et al. High-level feature detection from video in TRECVid: a 5-year retrospective of achievements , 2009 .
[65] Richard S. Zemel,et al. Exploring Models and Data for Image Question Answering , 2015, NIPS.
[66] Tobias Hinz,et al. Semantic Object Accuracy for Generative Text-to-Image Synthesis , 2020, IEEE transactions on pattern analysis and machine intelligence.
[67] Quoc V. Le,et al. Document Embedding with Paragraph Vectors , 2015, ArXiv.
[68] Adam Coates,et al. Deep Voice: Real-time Neural Text-to-Speech , 2017, ICML.
[69] Amnon Shashua,et al. Ranking with Large Margin Principle: Two Approaches , 2002, NIPS.
[70] Jeffrey Dean,et al. Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.
[71] Mubarak Shah,et al. UCF101: A Dataset of 101 Human Actions Classes From Videos in The Wild , 2012, ArXiv.
[72] Vineeth N. Balasubramanian,et al. Attentive Semantic Video Generation Using Captions , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).
[73] Yoshua Bengio,et al. Generative Adversarial Nets , 2014, NIPS.
[74] Kuldip K. Paliwal,et al. Bidirectional recurrent neural networks , 1997, IEEE Trans. Signal Process..
[75] Sanja Fidler,et al. MovieQA: Understanding Stories in Movies through Question-Answering , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[76] Mubarak Shah,et al. Visual Text Correction , 2018, ECCV.
[77] Xin Wang,et al. Cross-Modal Dual Learning for Sentence-to-Video Generation , 2019, ACM Multimedia.
[78] Fei-Fei Li,et al. Large-Scale Video Classification with Convolutional Neural Networks , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.
[79] Alexander J. Smola,et al. Stacked Attention Networks for Image Question Answering , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[80] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.
[81] Jing Yang,et al. Edge-Aware Deep Image Deblurring , 2019, Neurocomputing.
[82] John R. Smith,et al. IBM Research TRECVID-2009 Video Retrieval System , 2009, TRECVID.
[83] Yoshua Bengio,et al. Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling , 2014, ArXiv.
[84] Ruben Villegas,et al. Learning to Generate Long-term Future via Hierarchical Prediction , 2017, ICML.
[85] Samy Bengio,et al. Show and tell: A neural image caption generator , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[86] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[87] 拓海 杉山,et al. “Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks”の学習報告 , 2017 .
[88] Li Fei-Fei,et al. ImageNet: A large-scale hierarchical image database , 2009, CVPR.
[89] Zhe Gan,et al. AttnGAN: Fine-Grained Text to Image Generation with Attentional Generative Adversarial Networks , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[90] Jongwook Choi,et al. Video Captioning and Retrieval Models with Semantic Attention , 2016, ArXiv.
[91] Jaakko Lehtinen,et al. Progressive Growing of GANs for Improved Quality, Stability, and Variation , 2017, ICLR.
[92] Dong Wang,et al. Video search in concept subspace: a text-like paradigm , 2007, CIVR '07.
[93] Vladimir N. Vapnik,et al. The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.
[94] Jin Zhao,et al. Video Retrieval Using High Level Features: Exploiting Query Matching and Confidence-Based Weighting , 2006, CIVR.
[95] Dong Wang,et al. The importance of query-concept-mapping for automatic video retrieval , 2007, ACM Multimedia.
[96] Daniel Rueckert,et al. Real-Time Single Image and Video Super-Resolution Using an Efficient Sub-Pixel Convolutional Neural Network , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[97] Jason Weston,et al. Large-scale Simple Question Answering with Memory Networks , 2015, ArXiv.
[98] Ali Farhadi,et al. VisKE: Visual knowledge extraction and question answering by visual verification of relation phrases , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[99] G LoweDavid,et al. Distinctive Image Features from Scale-Invariant Keypoints , 2004 .
[100] Jiebo Luo,et al. Utilizing semantic word similarity measures for video retrieval , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.
[101] Mubarak Shah,et al. Fast Zero-Shot Image Tagging , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[102] Mubarak Shah,et al. Multi-modal Capsule Routing for Actor and Action Video Segmentation Conditioned on Natural Language Queries , 2018, ArXiv.
[103] Tibério S. Caetano,et al. Reverse Multi-Label Learning , 2010, NIPS.
[104] Mubarak Shah,et al. Video Fill In the Blank Using LR/RL LSTMs with Spatial-Temporal Attentions , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).
[105] Mubarak Shah,et al. DaMN - Discriminative and Mutually Nearest: Exploiting Pairwise Category Proximity for Video Action Recognition , 2014, ECCV.
[106] Thore Graepel,et al. Large Margin Rank Boundaries for Ordinal Regression , 2000 .
[107] Tegan Maharaj,et al. A Dataset and Exploration of Models for Understanding Video Data through Fill-in-the-Blank Question-Answering , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[108] Wojciech Zaremba,et al. Improved Techniques for Training GANs , 2016, NIPS.
[109] Léon Bottou,et al. Wasserstein GAN , 2017, ArXiv.
[110] Jan Kautz,et al. MoCoGAN: Decomposing Motion and Content for Video Generation , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[111] Andrew W. Fitzgibbon,et al. Efficient Object Category Recognition Using Classemes , 2010, ECCV.
[112] Hao Su,et al. Object Bank: A High-Level Image Representation for Scene Classification & Semantic Feature Sparsification , 2010, NIPS.
[113] Bernt Schiele,et al. A dataset for Movie Description , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[114] Xiaogang Wang,et al. Video Generation From Single Semantic Label Map , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[115] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.
[116] D. Opitz,et al. Popular Ensemble Methods: An Empirical Study , 1999, J. Artif. Intell. Res..
[117] In So Kweon,et al. Deep Video Inpainting , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[118] Richard Socher,et al. Ask Me Anything: Dynamic Memory Networks for Natural Language Processing , 2015, ICML.
[119] Cees G. M. Snoek,et al. The MediaMill at TRECVID 2013: : Searching concepts, Objects, Instances and events in video , 2013, TRECVID.
[120] Clement T. Yu,et al. Techniques and Systems for Image and Video Retrieval , 1999, IEEE Trans. Knowl. Data Eng..
[121] Thomas Hofmann,et al. Support vector machine learning for interdependent and structured output spaces , 2004, ICML.
[122] Francesca Murabito,et al. VOS-GAN: Adversarial Learning of Visual-Temporal Dynamics for Unsupervised Dense Prediction in Videos , 2018, ArXiv.
[123] Subhashini Venugopalan,et al. Translating Videos to Natural Language Using Deep Recurrent Neural Networks , 2014, NAACL.
[124] Rong Yan,et al. How many high-level concepts will fill the semantic gap in news video retrieval? , 2007, CIVR '07.
[125] Jason Weston,et al. Memory Networks , 2014, ICLR.
[126] Mubarak Shah,et al. Video Classification Using Semantic Concept Co-occurrences , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.
[127] Cees Snoek,et al. Actor and Action Video Segmentation from a Sentence , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[128] Robert L. Mercer,et al. Context based spelling correction , 1991, Inf. Process. Manag..
[129] Thomas Mensink,et al. Improving the Fisher Kernel for Large-Scale Image Classification , 2010, ECCV.
[130] Zhuowen Tu,et al. Harvesting Mid-level Visual Concepts from Large-Scale Internet Images , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.
[131] Chenliang Xu,et al. Can humans fly? Action understanding with multiple classes of actors , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[132] Li Fei-Fei,et al. DenseCap: Fully Convolutional Localization Networks for Dense Captioning , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[133] Alan F. Smeaton,et al. A Comparison of Score, Rank and Probability-Based Fusion Methods for Video Shot Retrieval , 2005, CIVR.
[134] Rong Yan,et al. The combination limit in multimedia retrieval , 2003, MULTIMEDIA '03.
[135] Rongrong Ji,et al. Weak attributes for large-scale image retrieval , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.
[136] Chung-Hsien Wu,et al. Sentence Correction Incorporating Relative Position and Parse Template Language Models , 2010, IEEE Transactions on Audio, Speech, and Language Processing.
[137] Phil Blunsom,et al. A Convolutional Neural Network for Modelling Sentences , 2014, ACL.
[138] Alan F. Smeaton. Techniques used and open challenges to the analysis, indexing and retrieval of digital video , 2007, Inf. Syst..
[139] Nitish Srivastava,et al. Unsupervised Learning of Video Representations using LSTMs , 2015, ICML.
[140] Tao Mei,et al. Correlative multi-label video annotation , 2007, ACM Multimedia.
[141] Deyu Meng,et al. Bridging the Ultimate Semantic Gap: A Semantic Search Engine for Internet Videos , 2015, ICMR.
[142] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.
[143] Jeffrey Pennington,et al. GloVe: Global Vectors for Word Representation , 2014, EMNLP.
[144] Marcel Worring,et al. The challenge problem for automated detection of 101 semantic concepts in multimedia , 2006, MM '06.
[145] Christoph H. Lampert,et al. Attribute-Based Classification for Zero-Shot Visual Object Categorization , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[146] Han Zhang,et al. Self-Attention Generative Adversarial Networks , 2018, ICML.
[147] Richard Socher,et al. Dynamic Memory Networks for Visual and Textual Question Answering , 2016, ICML.
[148] Yang Wang,et al. Image Retrieval with Structured Object Queries Using Latent Ranking SVM , 2012, ECCV.
[149] Tao Chen,et al. DeepSentiBank: Visual Sentiment Concept Classification with Deep Convolutional Neural Networks , 2014, ArXiv.
[150] Sergey Levine,et al. Stochastic Adversarial Video Prediction , 2018, ArXiv.
[151] Cordelia Schmid,et al. Action Recognition with Improved Trajectories , 2013, 2013 IEEE International Conference on Computer Vision.
[152] Chong-Wah Ngo,et al. Selection of Concept Detectors for Video Search by Ontology-Enriched Semantic Spaces , 2008, IEEE Transactions on Multimedia.
[153] Mubarak Shah,et al. Complex Events Detection Using Data-Driven Concepts , 2012, ECCV.