暂无分享,去创建一个
[1] Yanjun Qi,et al. Polynomial Semantic Indexing , 2009, NIPS.
[2] Tao Mei,et al. MSR-VTT: A Large Video Description Dataset for Bridging Video and Language , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[3] Cees Snoek,et al. Objects2action: Classifying and Localizing Actions without Any Video Example , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).
[4] Dong Liu,et al. EventNet: A Large Scale Structured Concept Library for Complex Event Detection in Video , 2015, ACM Multimedia.
[5] Chong-Wah Ngo,et al. Click-through-based Subspace Learning for Image Search , 2014, ACM Multimedia.
[6] Xiaoyong Du,et al. Zero-shot Image Tagging by Hierarchical Semantic Embedding , 2015, SIGIR.
[7] Wei-Ying Ma,et al. Bag-of-Words Based Deep Neural Network for Image Retrieval , 2014, ACM Multimedia.
[8] Michael S. Bernstein,et al. ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.
[9] Stefan Carlsson,et al. CNN Features Off-the-Shelf: An Astounding Baseline for Recognition , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition Workshops.
[10] Peter Young,et al. Framing Image Description as a Ranking Task: Data, Models and Evaluation Metrics , 2013, J. Artif. Intell. Res..
[11] Dennis Koelma,et al. The ImageNet Shuffle: Reorganized Pre-training for Video Event Detection , 2016, ICMR.
[12] Peter Young,et al. From image descriptions to visual denotations: New similarity metrics for semantic inference over event descriptions , 2014, TACL.
[13] Lior Wolf,et al. Fisher Vectors Derived from Hybrid Gaussian-Laplacian Mixture Models for Image Annotation , 2014, ArXiv.
[14] Roger Levy,et al. On the Role of Correlation and Abstraction in Cross-Modal Multimedia Retrieval , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[15] Yi Yang,et al. Fast and Accurate Content-based Semantic Search in 100M Internet Videos , 2015, ACM Multimedia.
[16] Yueting Zhuang,et al. Cross-media semantic representation via bi-directional learning to rank , 2013, ACM Multimedia.
[17] Wei Chen,et al. Jointly Modeling Deep Video and Compositional Text to Bridge Vision and Language in a Unified Framework , 2015, AAAI.
[18] Yueting Zhuang,et al. Deep Compositional Cross-modal Learning to Rank via Local-Global Alignment , 2015, ACM Multimedia.
[19] William B. Dolan,et al. Collecting Highly Parallel Data for Paraphrase Evaluation , 2011, ACL.
[20] Christoph Meinel,et al. Image Captioning with Deep Bidirectional LSTMs , 2016, ACM Multimedia.
[21] Tao Mei,et al. Jointly Modeling Embedding and Translation to Bridge Video and Language , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[22] Geoffrey Zweig,et al. From captions to visual concepts and back , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[23] Trevor Darrell,et al. Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.
[24] Samy Bengio,et al. A Discriminative Kernel-Based Approach to Rank Images from Text Queries , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[25] Tat-Seng Chua,et al. Deep Learning Generic Features for Cross-Media Retrieval , 2016, MMM.
[26] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.
[27] Dumitru Erhan,et al. Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[28] Ruifan Li,et al. Cross-modal Retrieval with Correspondence Autoencoder , 2014, ACM Multimedia.
[29] Jonathan G. Fiscus,et al. TRECVID 2016: Evaluating Video Search, Video Event Detection, Localization, and Hyperlinking , 2016, TRECVID.
[30] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[31] Chong-Wah Ngo,et al. Learning Query and Image Similarities with Ranking Canonical Correlation Analysis , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).
[32] Hongxun Yao,et al. Learning Cross Space Mapping via DNN Using Large Scale Click-Through Logs , 2015, IEEE Transactions on Multimedia.
[33] Björn W. Schuller,et al. Recent developments in openSMILE, the munich open-source multimedia feature extractor , 2013, ACM Multimedia.
[34] Svetlana Lazebnik,et al. Flickr30k Entities: Collecting Region-to-Phrase Correspondences for Richer Image-to-Sentence Models , 2015, International Journal of Computer Vision.
[35] José M. F. Moura,et al. VisualWord2Vec (Vis-W2V): Learning Visually Grounded Word Embeddings Using Abstract Scenes , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[36] Jeffrey Dean,et al. Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.
[37] Ewan Klein,et al. Natural Language Processing with Python , 2009 .
[38] Samy Bengio,et al. Show and tell: A neural image caption generator , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[39] Fei-Fei Li,et al. Deep visual-semantic alignments for generating image descriptions , 2015, CVPR.
[40] Trevor Darrell,et al. Sequence to Sequence -- Video to Text , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).
[41] Jeffrey Dean,et al. Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.
[42] Ting Yao,et al. Deep Learning for Video Classification and Captioning , 2016, Frontiers of Multimedia Research.
[43] Sanja Fidler,et al. Predicting Deep Zero-Shot Convolutional Neural Networks Using Textual Descriptions , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).
[44] Svetlana Lazebnik,et al. Improving Image-Sentence Embeddings Using Large Weakly Annotated Photo Collections , 2014, ECCV.
[45] Wei Xu,et al. Video Paragraph Captioning Using Hierarchical Recurrent Neural Networks , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[46] Marcus Rohrbach,et al. Translating Videos to Natural Language Using Deep Recurrent Neural Networks , 2014, NAACL.
[47] Wei Xu,et al. Deep Captioning with Multimodal Recurrent Neural Networks (m-RNN) , 2014, ICLR.
[48] Jing Wang,et al. Clickage: towards bridging semantic and intent gaps via mining click logs of search engines , 2013, ACM Multimedia.
[49] Naokazu Yokoya,et al. Learning Joint Representations of Videos and Sentences with Web Image Search , 2016, ECCV Workshops.
[50] Xiang Zhang,et al. OverFeat: Integrated Recognition, Localization and Detection using Convolutional Networks , 2013, ICLR.
[51] Cees Snoek,et al. VideoStory: A New Multimedia Embedding for Few-Example Recognition and Translation of Events , 2014, ACM Multimedia.
[52] Christopher Joseph Pal,et al. Describing Videos by Exploiting Temporal Structure , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).
[53] Cees Snoek,et al. Image2Emoji: Zero-shot Emoji Prediction for Visual Media , 2015, ACM Multimedia.
[54] Lorenzo Torresani,et al. Learning Spatiotemporal Features with 3D Convolutional Networks , 2014, 2015 IEEE International Conference on Computer Vision (ICCV).
[55] Lin Ma,et al. Multimodal Convolutional Neural Networks for Matching Image and Sentence , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).
[56] Chong-Wah Ngo,et al. Click-through-based cross-view learning for image search , 2014, SIGIR.
[57] Marc'Aurelio Ranzato,et al. DeViSE: A Deep Visual-Semantic Embedding Model , 2013, NIPS.
[58] John Shawe-Taylor,et al. Canonical Correlation Analysis: An Overview with Application to Learning Methods , 2004, Neural Computation.
[59] Samy Bengio,et al. Zero-Shot Learning by Convex Combination of Semantic Embeddings , 2013, ICLR.
[60] Larry P. Heck,et al. Learning deep structured semantic models for web search using clickthrough data , 2013, CIKM.
[61] Xiaoyong Du,et al. Image Retrieval by Cross-Media Relevance Fusion , 2015, ACM Multimedia.