Scene Aligned Pooling for Complex Video Recognition
暂无分享,去创建一个
Gang Hua | Shih-Fu Chang | John R. Smith | Yadong Mu | Liangliang Cao | Apostol Natsev | Shih-Fu Chang | G. Hua | Liangliang Cao | Yadong Mu | A. Natsev | John R. Smith
[1] Antonio Torralba,et al. Modeling the Shape of the Scene: A Holistic Representation of the Spatial Envelope , 2001, International Journal of Computer Vision.
[2] G. Wahba,et al. Some results on Tchebycheffian spline functions , 1971 .
[3] Nancy Kanwisher,et al. A cortical representation of the local visual environment , 1998, Nature.
[4] Cordelia Schmid,et al. Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).
[5] John R. Smith,et al. Semantic representation: search and mining of multimedia content , 2004, KDD '04.
[6] Thomas Serre,et al. A Biologically Inspired System for Action Recognition , 2007, 2007 IEEE 11th International Conference on Computer Vision.
[7] Barbara Caputo,et al. Recognizing human actions: a local SVM approach , 2004, ICPR 2004.
[8] Thomas Serre,et al. HMDB: A large video database for human motion recognition , 2011, 2011 International Conference on Computer Vision.
[9] Cordelia Schmid,et al. Actions in context , 2009, CVPR.
[10] Jean Ponce,et al. Learning mid-level features for recognition , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.
[11] Nicolas Le Roux,et al. Ask the locals: Multi-way local pooling for image recognition , 2011, 2011 International Conference on Computer Vision.
[12] Krista A. Ehinger,et al. SUN database: Large-scale scene recognition from abbey to zoo , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.
[13] Cordelia Schmid,et al. Learning realistic human actions from movies , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.
[14] Jitendra Malik,et al. Recognizing action at a distance , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.
[15] Yoshua Bengio,et al. Gradient-based learning applied to document recognition , 1998, Proc. IEEE.
[16] Juan Carlos Niebles,et al. Unsupervised Learning of Human Action Categories Using Spatial-Temporal Words , 2008, International Journal of Computer Vision.
[17] A. Friedman. Framing pictures: the role of knowledge in automatized encoding and memory for gist. , 1979, Journal of experimental psychology. General.
[18] Serge J. Belongie,et al. Behavior recognition via sparse spatio-temporal features , 2005, 2005 IEEE International Workshop on Visual Surveillance and Performance Evaluation of Tracking and Surveillance.
[19] Cordelia Schmid,et al. Aggregating local descriptors into a compact image representation , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.
[20] Fei-Fei Li,et al. What, where and who? Classifying events by scene and object recognition , 2007, 2007 IEEE 11th International Conference on Computer Vision.
[21] Gang Hua,et al. Semantic Model Vectors for Complex Video Event Recognition , 2012, IEEE Transactions on Multimedia.
[22] David Elliott,et al. In the Wild , 2010 .
[23] Juan Carlos Niebles,et al. Unsupervised Learning of Human Action Categories , 2006 .
[24] Umeshwar Dayal,et al. K-Harmonic Means - A Data Clustering Algorithm , 1999 .
[25] Pietro Perona,et al. A Bayesian hierarchical model for learning natural scene categories , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).
[26] Yihong Gong,et al. Linear spatial pyramid matching using sparse coding for image classification , 2009, CVPR.
[27] Chih-Jen Lin,et al. Large linear classification when data cannot fit in memory , 2010, KDD '10.
[28] Simon Haykin,et al. GradientBased Learning Applied to Document Recognition , 2001 .
[29] Ronen Basri,et al. Actions as Space-Time Shapes , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[30] Daniel P. W. Ellis,et al. IBM Research and Columbia University TRECVID-2011 Multimedia Event Detection (MED) System , 2011, TRECVID.
[31] A. Friedman. Framing pictures: the role of knowledge in automatized encoding and memory for gist. , 1979, Journal of experimental psychology. General.
[32] Barbara Caputo,et al. Recognizing human actions: a local SVM approach , 2004, Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004..
[33] Jiebo Luo,et al. Improved semantic region labeling based on scene context , 2005, 2005 IEEE International Conference on Multimedia and Expo.
[34] Antonio Torralba,et al. Object Recognition by Scene Alignment , 2007, NIPS.
[35] Chih-Jen Lin,et al. LIBLINEAR: A Library for Large Linear Classification , 2008, J. Mach. Learn. Res..
[36] Jiebo Luo,et al. Recognizing realistic actions from videos “in the wild” , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.