暂无分享,去创建一个
[1] Andrew Zisserman,et al. Deep Fisher Networks for Large-Scale Image Classification , 2013, NIPS.
[2] Martín Abadi,et al. TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems , 2016, ArXiv.
[3] Anastasios Tefas,et al. Learning Neural Bag-of-Features for Large-Scale Image Retrieval , 2017, IEEE Transactions on Systems, Man, and Cybernetics: Systems.
[4] Lorenzo Torresani,et al. Learning Spatiotemporal Features with 3D Convolutional Networks , 2014, 2015 IEEE International Conference on Computer Vision (ICCV).
[5] Sergey Ioffe,et al. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.
[6] Andrew Zisserman,et al. Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[7] Lior Wolf,et al. RNN Fisher Vectors for Action Recognition and Image Annotation , 2015, ECCV.
[8] Trevor Darrell,et al. Long-term recurrent convolutional networks for visual recognition and description , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[9] Robert A. Jacobs,et al. Hierarchical Mixtures of Experts and the EM Algorithm , 1993, Neural Computation.
[10] Cordelia Schmid,et al. Long-Term Temporal Convolutions for Action Recognition , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[11] Yi Yang,et al. A discriminative CNN video representation for event detection , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[12] Limin Wang,et al. Action recognition with trajectory-pooled deep-convolutional descriptors , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[13] Barbara Caputo,et al. Recognizing human actions: a local SVM approach , 2004, Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004..
[14] Andrew Zisserman,et al. Convolutional Two-Stream Network Fusion for Video Action Recognition , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[15] Yu Qiao,et al. Action Recognition with Stacked Fisher Vectors , 2014, ECCV.
[16] Florent Perronnin,et al. Fisher Kernels on Visual Vocabularies for Image Categorization , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.
[17] Sergey Ioffe,et al. Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning , 2016, AAAI.
[18] Miha Skalic,et al. Deep Learning Methods for Efficient Large Scale Video Labeling , 2017, ArXiv.
[19] Matthew J. Hausknecht,et al. Beyond short snippets: Deep networks for video classification , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[20] Greg Mori,et al. A Hierarchical Deep Temporal Model for Group Activity Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[21] Luc Van Gool,et al. Temporal Segment Networks: Towards Good Practices for Deep Action Recognition , 2016, ECCV.
[22] Fei-Fei Li,et al. Large-Scale Video Classification with Convolutional Neural Networks , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.
[23] Xiao Liu,et al. Temporal Modeling Approaches for Large-scale Youtube-8M Video Understanding , 2017, ArXiv.
[24] Tinne Tuytelaars,et al. Modeling video evolution for action recognition , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[25] Ming Yang,et al. 3D Convolutional Neural Networks for Human Action Recognition , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[26] Xi Wang,et al. Aggregating Frame-level Features for Large-Scale Video Classification , 2017, ArXiv.
[27] Yann Dauphin,et al. Language Modeling with Gated Convolutional Networks , 2016, ICML.
[28] Cordelia Schmid,et al. Stable Hyper-pooling and Query Expansion for Event Detection , 2013, 2013 IEEE International Conference on Computer Vision.
[29] Aren Jansen,et al. CNN architectures for large-scale audio classification , 2016, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[30] Antonio Manuel López Peña,et al. Sympathy for the Details: Dense Trajectories and Hybrid Classification Architectures for Action Recognition , 2016, ECCV.
[31] Apostol Natsev,et al. YouTube-8M: A Large-Scale Video Classification Benchmark , 2016, ArXiv.
[32] Cordelia Schmid,et al. Action Recognition with Improved Trajectories , 2013, 2013 IEEE International Conference on Computer Vision.
[33] Cordelia Schmid,et al. Learning realistic human actions from movies , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.
[34] Andrew Zisserman,et al. Video Google: a text retrieval approach to object matching in videos , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.
[35] Florent Perronnin,et al. Fisher vectors meet Neural Networks: A hybrid classification architecture , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[36] Yoshua Bengio,et al. On the Properties of Neural Machine Translation: Encoder–Decoder Approaches , 2014, SSST@EMNLP.
[37] Christian Wolf,et al. Sequential Deep Learning for Human Action Recognition , 2011, HBU.
[38] Michael Isard,et al. Lost in quantization: Improving particular object retrieval in large scale image databases , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.
[39] Richard P. Wildes,et al. Spatiotemporal Multiplier Networks for Video Action Recognition , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[40] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[41] Abhinav Gupta,et al. ActionVLAD: Learning Spatio-Temporal Aggregation for Action Classification , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[42] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.
[43] Ji Wu,et al. The Monkeytyping Solution to the YouTube-8M Video Understanding Challenge , 2017, ArXiv.
[44] Gabriela Csurka,et al. Visual categorization with bags of keypoints , 2002, eccv 2004.
[45] Juergen Gall,et al. A bag-of-words equivalent recurrent neural network for action recognition , 2017, Comput. Vis. Image Underst..
[46] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[47] Andrew Zisserman,et al. Two-Stream Convolutional Networks for Action Recognition in Videos , 2014, NIPS.
[48] Tomás Pajdla,et al. NetVLAD: CNN Architecture for Weakly Supervised Place Recognition , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[49] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.
[50] Cordelia Schmid,et al. Aggregating local descriptors into a compact image representation , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.
[51] Limin Wang,et al. Boosting VLAD with Supervised Dictionary Learning and High-Order Statistics , 2014, ECCV.
[52] Christoph H. Lampert,et al. Deep Fisher Kernels -- End to End Learning of the Fisher Kernel GMM Parameters , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.
[53] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.
[54] Yang Gao,et al. Compact Bilinear Pooling , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).