NII-HITACHI-UIT at TRECVID 2017
暂无分享,去创建一个
Duy-Dinh Le | Vinh-Tiep Nguyen | Zheng Wang | Manikandan Ravikiran | Thanh Duc Ngo | Tomokazu Murakami | Martin Klinkigt | Shin'ichi Satoh | Atsushi Hiroike | Tomoaki Yoshinaga | Hung Quoc Vo | Quan Kong | Vu-Minh-Hieu Dang | Duy-Nhat Nguyen | Jian Vora | Mohit Chabra | Tien-Van Do | Sinha Saptarshi | Charles Limasanches | Tushar Agrawal | Hung Q. Vo | Vinh-Tiep Nguyen | Zheng Wang | S. Satoh | Tushar Agrawal | Martin Klinkigt | Tomokazu Murakami | Tomoaki Yoshinaga | Jian Vora | Manikandan Ravikiran | Quan Kong | T. Ngo | Duy-Dinh Le | Duy-Nhat Nguyen | Vu-Minh-Hieu Dang | A. Hiroike | Mohit Chabra | T. Do | Sinha Saptarshi | Charles Limasanches
[1] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.
[2] James Hays,et al. SUN attribute database: Discovering, annotating, and recognizing scene attributes , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.
[3] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[4] Chih-Jen Lin,et al. LIBLINEAR: A Library for Large Linear Classification , 2008, J. Mach. Learn. Res..
[5] Kai Uwe Barthel,et al. Navigating a Graph of Scenes for Exploring Large Video Collections , 2016, MMM.
[6] Tal Hassner,et al. Face recognition in unconstrained videos with matched background similarity , 2011, CVPR 2011.
[7] Aren Jansen,et al. Audio Set: An ontology and human-labeled dataset for audio events , 2017, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[8] Mihai Surdeanu,et al. The Stanford CoreNLP Natural Language Processing Toolkit , 2014, ACL.
[9] Vicente Ordonez,et al. Im2Text: Describing Images Using 1 Million Captioned Photographs , 2011, NIPS.
[10] Nuno Vasconcelos,et al. Cascade R-CNN: Delving Into High Quality Object Detection , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[11] Kaiming He,et al. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[12] Tao Mei,et al. MSR-VTT: A Large Video Description Dataset for Bridging Video and Language , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[13] Trevor Darrell,et al. YouTube2Text: Recognizing and Describing Arbitrary Activities Using Semantic Hierarchies and Zero-Shot Recognition , 2013, 2013 IEEE International Conference on Computer Vision.
[14] Ioannis Patras,et al. Comparison of Fine-Tuning and Extension Strategies for Deep Convolutional Neural Networks , 2017, MMM.
[15] Cordelia Schmid,et al. Action recognition by dense trajectories , 2011, CVPR 2011.
[16] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.
[17] Jonathan G. Fiscus,et al. TRECVID 2016: Evaluating Video Search, Video Event Detection, Localization, and Hyperlinking , 2016, TRECVID.
[18] Changsheng Xu,et al. Semantic Feature Mining for Video Event Understanding , 2016, ACM Trans. Multim. Comput. Commun. Appl..
[19] Rui Caseiro,et al. High-Speed Tracking with Kernelized Correlation Filters , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[20] Paul Over,et al. Instance search retrospective with focus on TRECVID , 2017, International Journal of Multimedia Information Retrieval.
[21] Jonathan G. Fiscus,et al. TRECVID 2018: Benchmarking Video Activity Detection, Video Captioning and Matching, Video Storytelling Linking and Video Search , 2018, TRECVID.
[22] Yusuke Miyao,et al. MANet: A Modal Attention Network for Describing Videos , 2017, ACM Multimedia.
[23] Subhashini Venugopalan,et al. Translating Videos to Natural Language Using Deep Recurrent Neural Networks , 2014, NAACL.
[24] Li Fei-Fei,et al. DenseCap: Fully Convolutional Localization Networks for Dense Captioning , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[25] Michael Isard,et al. Object retrieval with large vocabularies and fast spatial matching , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.
[26] Aren Jansen,et al. CNN architectures for large-scale audio classification , 2016, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[27] Albert Gordo,et al. Deep Image Retrieval: Learning Global Representations for Image Search , 2016, ECCV.
[28] Georges Quénot,et al. TRECVID 2017: Evaluating Ad-hoc and Instance Video Search, Events Detection, Video Captioning and Hyperlinking , 2017, TRECVID.
[29] Michael S. Bernstein,et al. ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.
[30] Florent Perronnin,et al. Fisher Kernels on Visual Vocabularies for Image Categorization , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.
[31] Luc Van Gool,et al. Face Detection without Bells and Whistles , 2014, ECCV.
[32] Dennis Koelma,et al. The ImageNet Shuffle: Reorganized Pre-training for Video Event Detection , 2016, ICMR.
[33] Georges Quénot,et al. TRECVid Semantic Indexing of Video: A 6-year Retrospective , 2016 .
[34] Duy-Dinh Le,et al. Video Event Detection by Exploiting Word Dependencies from Image Captions , 2016, COLING.
[35] Xiaogang Wang,et al. Object Detection from Video Tubelets with Convolutional Neural Networks , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[36] James Philbin,et al. FaceNet: A unified embedding for face recognition and clustering , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[37] Michael S. Bernstein,et al. Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations , 2016, International Journal of Computer Vision.
[38] David J. Fleet,et al. VSE++: Improved Visual-Semantic Embeddings , 2017, ArXiv.
[39] Lorenzo Torresani,et al. Learning Spatiotemporal Features with 3D Convolutional Networks , 2014, 2015 IEEE International Conference on Computer Vision (ICCV).
[40] Mubarak Shah,et al. UCF101: A Dataset of 101 Human Actions Classes From Videos in The Wild , 2012, ArXiv.
[41] Omkar M. Parkhi,et al. VGGFace2: A Dataset for Recognising Faces across Pose and Age , 2017, 2018 13th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2018).
[42] Xiaogang Wang,et al. Visual Tracking with Fully Convolutional Networks , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).
[43] Andrew Zisserman,et al. Video Google: a text retrieval approach to object matching in videos , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.
[44] Andrew Zisserman,et al. All About VLAD , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.
[45] Chao Liang,et al. WHU-NERCMS at TRECVID2016: Instance Search Task , 2016, TRECVID.
[46] Andrew Zisserman,et al. Deep Face Recognition , 2015, BMVC.
[47] Christopher Joseph Pal,et al. Describing Videos by Exploiting Temporal Structure , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).
[48] Quoc V. Le,et al. Distributed Representations of Sentences and Documents , 2014, ICML.
[49] Li Fei-Fei,et al. ImageNet: A large-scale hierarchical image database , 2009, CVPR.
[50] Yu Qiao,et al. Joint Face Detection and Alignment Using Multitask Cascaded Convolutional Networks , 2016, IEEE Signal Processing Letters.
[51] Fabio Tozeto Ramos,et al. Simple online and realtime tracking , 2016, 2016 IEEE International Conference on Image Processing (ICIP).
[52] Andrew Y. Ng,et al. End-to-End People Detection in Crowded Scenes , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[53] Duy-Dinh Le,et al. Robust Face Track Finding in Video Using Tracked Points , 2008, 2008 IEEE International Conference on Signal Image Technology and Internet Based Systems.
[54] Bolei Zhou,et al. Learning Deep Features for Scene Recognition using Places Database , 2014, NIPS.
[55] Fei-Fei Li,et al. Deep visual-semantic alignments for generating image descriptions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).