Captioning Videos Using Large-Scale Image Corpus
暂无分享,去创建一个
Yang Yang | Fumin Shen | Jinhui Tang | Xiao-Yu Du | Liu Yang | Zhiguang Qin
[1] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.
[2] Yang Yang,et al. Zero-Shot Hashing via Transferring Supervised Knowledge , 2016, ACM Multimedia.
[3] Trevor Darrell,et al. Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.
[4] Jing Liu,et al. Clustering-Guided Sparse Structural Learning for Unsupervised Feature Selection , 2014, IEEE Transactions on Knowledge and Data Engineering.
[5] Alon Lavie,et al. Meteor Universal: Language Specific Translation Evaluation for Any Target Language , 2014, WMT@ACL.
[6] David G. Lowe,et al. Object recognition from local scale-invariant features , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.
[7] Adam L. Berger,et al. A Maximum Entropy Approach to Natural Language Processing , 1996, CL.
[8] Piotr Indyk,et al. Similarity Search in High Dimensions via Hashing , 1999, VLDB.
[9] Iuliana F Iatan. The expectation-maximization algorithm: Gaussian case , 2010, 2010 International Conference on Networking and Information Technology.
[10] Moses Charikar,et al. Similarity estimation techniques from rounding algorithms , 2002, STOC '02.
[11] Wei Liu,et al. Supervised Discrete Hashing , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[12] Jorma Laaksonen,et al. Video captioning with recurrent networks based on frame- and video-level features and visual content classification , 2015, ArXiv.
[13] Yi Yang,et al. Hierarchical Recurrent Neural Encoder for Video Representation with Application to Captioning , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[14] Ali Farhadi,et al. Situation Recognition: Visual Semantic Role Labeling for Image Understanding , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[15] Dumitru Erhan,et al. Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[16] Yang Yang,et al. A Fast Optimization Method for General Binary Code Learning , 2016, IEEE Transactions on Image Processing.
[17] Jing Liu,et al. Robust Structured Subspace Learning for Data Representation , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[18] Zi Huang,et al. Inter-media hashing for large-scale retrieval from heterogeneous data sources , 2013, SIGMOD '13.
[19] Qi Tian,et al. HMM-Based Audio Keyword Generation , 2004, PCM.
[20] Stefan Carlsson,et al. CNN Features Off-the-Shelf: An Astounding Baseline for Recognition , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition Workshops.
[21] Wei Xu,et al. Are You Talking to a Machine? Dataset and Methods for Multilingual Image Question , 2015, NIPS.
[22] Lukás Burget,et al. Recurrent neural network based language model , 2010, INTERSPEECH.
[23] Shuicheng Yan,et al. Body Surface Context: A New Robust Feature for Action Recognition From Depth Videos , 2014, IEEE Transactions on Circuits and Systems for Video Technology.
[24] Jinhui Tang,et al. Generalized Deep Transfer Networks for Knowledge Propagation in Heterogeneous Domains , 2016, ACM Trans. Multim. Comput. Commun. Appl..
[25] Trevor Darrell,et al. YouTube2Text: Recognizing and Describing Arbitrary Activities Using Semantic Hierarchies and Zero-Shot Recognition , 2013, 2013 IEEE International Conference on Computer Vision.
[26] Pascal Fua,et al. LDAHash: Improved Matching with Smaller Descriptors , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[27] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.
[28] Kate Saenko,et al. Integrating Language and Vision to Generate Natural Language Descriptions of Videos in the Wild , 2014, COLING.
[29] Meng Wang,et al. Tri-Clustered Tensor Completion for Social-Aware Image Tag Refinement , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[30] Zi Huang,et al. Robust discrete code modeling for supervised hashing , 2018, Pattern Recognit..
[31] Silvio Savarese,et al. Unsupervised Semantic Parsing of Video Collections , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).
[32] Xirong Li,et al. TagBook: A Semantic Video Representation Without Supervision for Event Detection , 2015, IEEE Transactions on Multimedia.
[33] Yue Gao,et al. Exploiting Web Images for Semantic Video Indexing Via Robust Sample-Specific Loss , 2014, IEEE Transactions on Multimedia.
[34] Yi Yang,et al. Effective transfer tagging from image to video , 2013, TOMCCAP.
[35] Baoli Li,et al. Traffic-Sign Detection and Classification in the Wild , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[36] Saurabh Gupta,et al. Exploring Nearest Neighbor Approaches for Image Captioning , 2015, ArXiv.
[37] Salim Roukos,et al. Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.
[38] T. Moon. The expectation-maximization algorithm , 1996, IEEE Signal Process. Mag..
[39] Christopher Joseph Pal,et al. Describing Videos by Exploiting Temporal Structure , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).
[40] Xuelong Li,et al. Robust Discrete Spectral Hashing for Large-Scale Image Semantic Indexing , 2015, IEEE Transactions on Big Data.
[41] Chin-Yew Lin,et al. ROUGE: A Package for Automatic Evaluation of Summaries , 2004, ACL 2004.
[42] Jinhui Tang,et al. Weakly Supervised Deep Matrix Factorization for Social Image Understanding , 2017, IEEE Transactions on Image Processing.
[43] Vicente Ordonez,et al. Im2Text: Describing Images Using 1 Million Captioned Photographs , 2011, NIPS.
[44] Heng Tao Shen,et al. Hashing for Similarity Search: A Survey , 2014, ArXiv.
[45] Heng Tao Shen,et al. Hashing on Nonlinear Manifolds , 2014, IEEE Transactions on Image Processing.
[46] Meng Wang,et al. Neighborhood Discriminant Hashing for Large-Scale Image Retrieval , 2015, IEEE Transactions on Image Processing.
[47] Ruslan Salakhutdinov,et al. Multimodal Neural Language Models , 2014, ICML.
[48] Svetlana Lazebnik,et al. Iterative quantization: A procrustean approach to learning binary codes , 2011, CVPR 2011.
[49] Tao Mei,et al. Correlative multi-label video annotation , 2007, ACM Multimedia.
[50] C. Lawrence Zitnick,et al. CIDEr: Consensus-based image description evaluation , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[51] Xinlei Chen,et al. Microsoft COCO Captions: Data Collection and Evaluation Server , 2015, ArXiv.
[52] Zhongfei Zhang,et al. LSTM-in-LSTM for generating long descriptions of images , 2016, Computational Visual Media.
[53] Xuelong Li,et al. Visual Coding in a Semantic Hierarchy , 2015, ACM Multimedia.
[54] Christopher Joseph Pal,et al. Delving Deeper into Convolutional Networks for Learning Video Representations , 2015, ICLR.
[55] Qi Wu,et al. Image Captioning with an Intermediate Attributes Layer , 2015, ArXiv.
[56] Dong Liu,et al. Event-Driven Semantic Concept Discovery by Exploiting Weakly Tagged Internet Images , 2014, ICMR.