ACMNet: Adaptive Confidence Matching Network for Human Behavior Analysis via Cross-modal Retrieval
暂无分享,去创建一个
HUI CHEN | GUIGUANG DING | ZIJIA LIN | SICHENG ZHAO | GU XIAOPENG | Guiguang Ding | Sicheng Zhao | Hui Chen | Zijia Lin | Guo Xiaopeng
[1] Li Fei-Fei,et al. ImageNet: A large-scale hierarchical image database , 2009, CVPR.
[2] Trevor Darrell,et al. Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.
[3] Rodrigo C. Barros,et al. Bidirectional Retrieval Made Simple , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[4] Margaret Mitchell,et al. VQA: Visual Question Answering , 2015, International Journal of Computer Vision.
[5] Yan Huang,et al. Learning Semantic Concepts and Order for Image and Sentence Matching , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[6] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[7] Michael S. Bernstein,et al. Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations , 2016, International Journal of Computer Vision.
[8] Xuelong Li,et al. Robust Visual Tracking Using Structurally Random Projection and Weighted Least Squares , 2015, IEEE Transactions on Circuits and Systems for Video Technology.
[9] Byoung-Tak Zhang,et al. Bilinear Attention Networks , 2018, NeurIPS.
[10] Qionghai Dai,et al. DECODE: Deep Confidence Network for Robust Image Classification , 2019, IEEE Transactions on Image Processing.
[11] Qiang Ni,et al. Joint Image-Text Hashing for Fast Large-Scale Cross-Media Retrieval Using Self-Supervised Deep Learning , 2019, IEEE Transactions on Industrial Electronics.
[12] Wenwu Zhu,et al. Message From the Outgoing Editor-in-Chief , 2020, IEEE Trans. Multim..
[13] Jeffrey Dean,et al. Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.
[14] Tomas Mikolov,et al. Bag of Tricks for Efficient Text Classification , 2016, EACL.
[15] Xirong Li,et al. Dual Encoding for Zero-Example Video Retrieval , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[16] Wei Wang,et al. Instance-Aware Image and Sentence Matching with Selective Multimodal LSTM , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[17] Yongdong Zhang,et al. STAT: Spatial-Temporal Attention Mechanism for Video Captioning , 2020, IEEE Transactions on Multimedia.
[18] Zhedong Zheng,et al. Dual-path Convolutional Image-Text Embeddings with Instance Loss , 2017, ACM Trans. Multim. Comput. Commun. Appl..
[19] Ruslan Salakhutdinov,et al. Unifying Visual-Semantic Embeddings with Multimodal Neural Language Models , 2014, ArXiv.
[20] Jung-Woo Ha,et al. Dual Attention Networks for Multimodal Reasoning and Matching , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[21] Rama Chellappa,et al. Learning Common and Feature-Specific Patterns: A Novel Multiple-Sparse-Representation-Based Tracker , 2018, IEEE Transactions on Image Processing.
[22] Yoshua Bengio,et al. Attention-Based Models for Speech Recognition , 2015, NIPS.
[23] Ali Farhadi,et al. You Only Look Once: Unified, Real-Time Object Detection , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[24] Qionghai Dai,et al. Cross-Modality Bridging and Knowledge Transferring for Image Understanding , 2019, IEEE Transactions on Multimedia.
[25] Kaiming He,et al. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[26] Qingming Huang,et al. Hedging Deep Features for Visual Tracking , 2019, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[27] Yoshua Bengio,et al. Show, Attend and Tell: Neural Image Caption Generation with Visual Attention , 2015, ICML.
[28] Svetlana Lazebnik,et al. Flickr30k Entities: Collecting Region-to-Phrase Correspondences for Richer Image-to-Sentence Models , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).
[29] Dumitru Erhan,et al. Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[30] Gang Wang,et al. Look, Imagine and Match: Improving Textual-Visual Cross-Modal Retrieval with Generative Models , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[31] Amit K. Roy-Chowdhury,et al. Webly Supervised Joint Embedding for Cross-Modal Image-Text Retrieval , 2018, ACM Multimedia.
[32] ZhaoDebin,et al. Multi-layered gesture recognition with Kinect , 2015 .
[33] Sanja Fidler,et al. Order-Embeddings of Images and Language , 2015, ICLR.
[34] David J. Fleet,et al. VSE++: Improving Visual-Semantic Embeddings with Hard Negatives , 2017, BMVC.
[35] Lei Zhang,et al. Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[36] Yoshua Bengio,et al. Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.
[37] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.
[38] Pong C. Yuen,et al. Learning Modality-Consistency Feature Templates: A Robust RGB-Infrared Tracking System , 2019, IEEE Transactions on Industrial Electronics.
[39] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.
[40] Hui Chen,et al. Show, Observe and Tell: Attribute-driven Attention Model for Image Captioning , 2018, IJCAI.
[41] Xi Chen,et al. Stacked Cross Attention for Image-Text Matching , 2018, ECCV.
[42] Fei-Fei Li,et al. Deep visual-semantic alignments for generating image descriptions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[43] Hui Chen,et al. GRN: Gated Relation Network to Enhance Convolutional Neural Network for Named Entity Recognition , 2019, AAAI.
[44] Pong C. Yuen,et al. Robust Visual Tracking via Basis Matching , 2017, IEEE Transactions on Circuits and Systems for Video Technology.
[45] Marc'Aurelio Ranzato,et al. DeViSE: A Deep Visual-Semantic Embedding Model , 2013, NIPS.
[46] Ling Shao,et al. Unsupervised Deep Video Hashing via Balanced Code for Large-Scale Video Retrieval , 2019, IEEE Transactions on Image Processing.
[47] Rama Chellappa,et al. Joint Sparse Representation and Robust Feature-Level Fusion for Multi-Cue Visual Tracking , 2015, IEEE Transactions on Image Processing.
[48] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.
[49] Xuelong Li,et al. A Biologically Inspired Appearance Model for Robust Visual Tracking , 2017, IEEE Transactions on Neural Networks and Learning Systems.
[50] Jeffrey Pennington,et al. GloVe: Global Vectors for Word Representation , 2014, EMNLP.
[51] Luke S. Zettlemoyer,et al. Deep Contextualized Word Representations , 2018, NAACL.
[52] Tomas Mikolov,et al. Enriching Word Vectors with Subword Information , 2016, TACL.
[53] Luca Antiga,et al. Automatic differentiation in PyTorch , 2017 .
[54] Yang Gao,et al. Multi-layered gesture recognition with Kinect , 2015, J. Mach. Learn. Res..
[55] Aviv Eisenschtat,et al. Linking Image and Text with 2-Way Nets , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[56] Yoshua Bengio,et al. Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.
[57] Krystian Mikolajczyk,et al. Deep correlation for matching images and text , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[58] Yuxin Peng,et al. CM-GANs , 2019, ACM Trans. Multim. Comput. Commun. Appl..
[59] Sam Coope,et al. Neural Named Entity Recognition Using a Self-Attention Mechanism , 2017, 2017 IEEE 29th International Conference on Tools with Artificial Intelligence (ICTAI).
[60] Yin Li,et al. Learning Deep Structure-Preserving Image-Text Embeddings , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[61] Jeffrey Dean,et al. Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.
[62] Gang Hua,et al. Hierarchical Multimodal LSTM for Dense Visual-Semantic Embedding , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).
[63] Jungong Han,et al. Real-Time Scalable Visual Tracking via Quadrangle Kernelized Correlation Filters , 2018, IEEE Transactions on Intelligent Transportation Systems.
[64] Meng Liu,et al. Attentive Moment Retrieval in Videos , 2018, SIGIR.