Multi-modal tag localization for mobile video search
暂无分享,去创建一个
Sheng Tang | Yongdong Zhang | Jintao Li | Rui Zhang | Wu Liu | Wu Liu | Sheng Tang | Yongdong Zhang | Jintao Li | Rui Zhang
[1] Ming Yang,et al. 3D Convolutional Neural Networks for Human Action Recognition , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[2] Juhan Nam,et al. Multimodal Deep Learning , 2011, ICML.
[3] Yongdong Zhang,et al. Instant Mobile Video Search With Layered Audio-Video Indexing and Progressive Transmission , 2014, IEEE Transactions on Multimedia.
[4] Fei-Fei Li,et al. Large-Scale Video Classification with Convolutional Neural Networks , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.
[5] Weifeng Liu,et al. Multiview Hessian Regularization for Image Annotation , 2013, IEEE Transactions on Image Processing.
[6] Adrian Ulges,et al. Multiple Instance Learning from Weakly Labeled Videos , 2009 .
[7] Trevor Darrell,et al. Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.
[8] Fei-Fei Li,et al. ImageNet: A large-scale hierarchical image database , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.
[9] Sheng Tang,et al. Multimodal tag localization based on deep learning , 2015, ICIMCS '15.
[10] Marc'Aurelio Ranzato,et al. DeViSE: A Deep Visual-Semantic Embedding Model , 2013, NIPS.
[11] Nitish Srivastava,et al. Multimodal learning with deep Boltzmann machines , 2012, J. Mach. Learn. Res..
[12] Adrian Ulges,et al. Identifying relevant frames in weakly labeled videos for training concept detectors , 2008, CIVR '08.
[13] Bin Liu,et al. Localizing relevant frames in web videos using topic model and relevance filtering , 2013, Machine Vision and Applications.
[14] Pietro Perona,et al. Learning object categories from Google's image search , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.
[15] Yong Wang,et al. Coherent image annotation by learning semantic distance , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.
[16] Ross B. Girshick,et al. Fast R-CNN , 2015, 1504.08083.
[17] Wei-Ta Chu,et al. Tag suggestion and localization for web videos by bipartite graph matching , 2011, WSM '11.
[18] Xian-Sheng Hua,et al. To learn representativeness of video frames , 2005, MULTIMEDIA '05.
[19] Jiebo Luo,et al. Towards Extracting Semantically Meaningful Key Frames From Personal Video Clips: From Humans to Computers , 2009, IEEE Transactions on Circuits and Systems for Video Technology.
[20] Yuan Yan Tang,et al. Multiview Hessian discriminative sparse coding for image annotation , 2013, Comput. Vis. Image Underst..
[21] Alberto Del Bimbo,et al. Tag suggestion and localization in user-generated videos based on social knowledge , 2010, WSM@MM.
[22] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.
[23] Haojie Li,et al. DUT-WEBV: A Benchmark Dataset for Performance Evaluation of Tag Localization for Web Video , 2013, MMM.
[24] Tao Mei,et al. Jointly Modeling Embedding and Translation to Bridge Video and Language , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[25] Jeffrey Dean,et al. Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.
[26] Alberto Del Bimbo,et al. A data-driven approach for tag refinement and localization in web videos , 2015, Comput. Vis. Image Underst..
[27] Nitish Srivastava,et al. Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..
[28] Meng Wang,et al. ShotTagger: tag location for internet videos , 2011, ICMR.
[29] Camille Couprie,et al. Learning Hierarchical Features for Scene Labeling , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[30] Dacheng Tao,et al. Multi-View Intact Space Learning , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[31] Dacheng Tao,et al. Large-Margin Multi-ViewInformation Bottleneck , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[32] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.
[33] Tao Mei,et al. Contextual Video Recommendation by Multimodal Relevance and User Feedback , 2011, TOIS.
[34] Yongdong Zhang,et al. Multi-task deep visual-semantic embedding for video thumbnail selection , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[35] Xuelong Li,et al. Hessian Regularized Support Vector Machines for Mobile Image Annotation on the Cloud , 2013, IEEE Transactions on Multimedia.
[36] Trevor Darrell,et al. DeCAF: A Deep Convolutional Activation Feature for Generic Visual Recognition , 2013, ICML.