VideoWhisper: Toward Discriminative Unsupervised Video Feature Learning With Attention-Based Recurrent Neural Networks
暂无分享,去创建一个
Meng Wang | Tat-Seng Chua | Richang Hong | Na Zhao | Hanwang Zhang | Hanwang Zhang | Meng Wang | Tat-Seng Chua | Richang Hong | Na Zhao
[1] Fei-Fei Li,et al. Large-Scale Video Classification with Convolutional Neural Networks , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.
[2] Yoshua Bengio,et al. Describing Multimedia Content Using Attention-Based Encoder-Decoder Networks , 2015, IEEE Transactions on Multimedia.
[3] Yoshua Bengio,et al. Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.
[4] Yoshua Bengio,et al. Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.
[5] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.
[6] Trevor Darrell,et al. Sequence to Sequence -- Video to Text , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).
[7] Marc'Aurelio Ranzato,et al. Video (language) modeling: a baseline for generative models of natural videos , 2014, ArXiv.
[8] Xi Wang,et al. Modeling Spatial-Temporal Clues in a Hybrid Deep Learning Framework for Video Classification , 2015, ACM Multimedia.
[9] Huanbo Luan,et al. Discrete Collaborative Filtering , 2016, SIGIR.
[10] Chong-Wah Ngo,et al. Human Action Recognition in Unconstrained Videos by Explicit Motion Modeling , 2015, IEEE Transactions on Image Processing.
[11] Shih-Fu Chang,et al. Exploiting Feature and Class Relationships in Video Categorization with Regularized Deep Neural Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[12] Nitish Srivastava,et al. Unsupervised Learning of Video Representations using LSTMs , 2015, ICML.
[13] Tao Mei,et al. Super Fast Event Recognition in Internet Videos , 2015, IEEE Transactions on Multimedia.
[14] Lorenzo Torresani,et al. Learning Spatiotemporal Features with 3D Convolutional Networks , 2014, 2015 IEEE International Conference on Computer Vision (ICCV).
[15] Yoshua Bengio,et al. End-to-end attention-based large vocabulary speech recognition , 2015, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[16] Alexander G. Hauptmann,et al. MoSIFT: Recognizing Human Actions in Surveillance Videos , 2009 .
[17] Meng Wang,et al. VIDEOWHISPER: Towards unsupervised learning of discriminative features of videos with RNN , 2017, 2017 IEEE International Conference on Multimedia and Expo (ICME).
[18] Cordelia Schmid,et al. Action Recognition with Improved Trajectories , 2013, 2013 IEEE International Conference on Computer Vision.
[19] Chong-Wah Ngo,et al. Video Event Detection Using Motion Relativity and Feature Selection , 2014, IEEE Transactions on Multimedia.
[20] Yihong Gong,et al. Locality-constrained Linear Coding for image classification , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.
[21] Meng Wang,et al. Stochastic Multiview Hashing for Large-Scale Near-Duplicate Video Retrieval , 2017, IEEE Transactions on Multimedia.
[22] Trevor Darrell,et al. Long-term recurrent convolutional networks for visual recognition and description , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[23] Xinhang Song,et al. Rich Image Description Based on Regions , 2015, ACM Multimedia.
[24] Ivan Laptev,et al. On Space-Time Interest Points , 2005, International Journal of Computer Vision.
[25] Yoshua Bengio,et al. Show, Attend and Tell: Neural Image Caption Generation with Visual Attention , 2015, ICML.
[26] Shih-Fu Chang,et al. Visual Translation Embedding Network for Visual Relation Detection , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[27] Matthew J. Hausknecht,et al. Beyond short snippets: Deep networks for video classification , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[28] David A. Shamma,et al. The New Data and New Challenges in Multimedia Research , 2015, ArXiv.
[29] Quoc V. Le,et al. Sequence to Sequence Learning with Neural Networks , 2014, NIPS.
[30] Thomas Mensink,et al. Image Classification with the Fisher Vector: Theory and Practice , 2013, International Journal of Computer Vision.
[31] Andrew Zisserman,et al. Two-Stream Convolutional Networks for Action Recognition in Videos , 2014, NIPS.
[32] Zi Huang,et al. Effective Multiple Feature Hashing for Large-Scale Near-Duplicate Video Retrieval , 2013, IEEE Transactions on Multimedia.
[33] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.
[34] Cordelia Schmid,et al. Aggregating local descriptors into a compact image representation , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.
[35] Yi Yang,et al. A discriminative CNN video representation for event detection , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[36] Bo Pang,et al. Seeing Stars: Exploiting Class Relationships for Sentiment Categorization with Respect to Rating Scales , 2005, ACL.
[37] Andrew Y. Ng,et al. Learning Feature Representations with K-Means , 2012, Neural Networks: Tricks of the Trade.