论文信息 - Mid-level Representation for Visual Recognition

Mid-level Representation for Visual Recognition

Visual Recognition is one of the fundamental challenges in AI, where the goal is to understand the semantics of visual data. Employing mid-level representation, in particular, shifted the paradigm in visual recognition. The mid-level image/video representation involves discovering and training a set of mid-level visual patterns (e.g., parts and attributes) and represent a given image/video utilizing them. The mid-level patterns can be extracted from images and videos using the motion and appearance information of visual phenomenas. This thesis targets employing mid-level representations for different high-level visual recognition tasks, namely (i)image understanding and (ii)video understanding. In the case of image understanding, we focus on object detection/recognition task. We investigate on discovering and learning a set of mid-level patches to be used for representing the images of an object category. We specifically employ the discriminative patches in a subcategory-aware webly-supervised fashion. We, additionally, study the outcomes provided by employing the subcategory-based models for undoing dataset bias.

Moin Nabi | Moin Nabi

[1] Ali Farhadi,et al. Phrasal Recognition , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[2] Ze-Nian Li. BEYOND ACTIONS : DISCRIMINATIVE MODELS FOR CONTEXTUAL GROUP ACTIVITIES , 2010 .

[3] Shaogang Gong,et al. Video Behavior Profiling for Anomaly Detection , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[4] Stefan Carlsson,et al. Self-tuned Visual Subclass Learning with Shared Samples An Incremental Approach , 2014, ArXiv.

[5] Xiaofeng Ren,et al. Discriminative Mixture-of-Templates for Viewpoint Classification , 2010, ECCV.

[6] David G. Lowe,et al. Object recognition from local scale-invariant features , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[7] Yangsheng Xu,et al. An energy model approach to people counting for abnormal crowd behavior detection , 2012, Neurocomputing.

[8] Fei-Fei Li,et al. Learning latent temporal structure for complex event detection , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[9] David G. Stork,et al. Pattern Classification (2nd ed.) , 1999 .

[10] Jitendra Malik,et al. Poselets: Body part detectors trained using 3D human pose annotations , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[11] Charless C. Fowlkes,et al. Do We Need More Training Data or Better Models for Object Detection? , 2012, BMVC.

[12] Brett J. Borghetti,et al. A Review of Anomaly Detection in Automated Surveillance , 2012, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[13] Larry S. Davis,et al. Combining Per-frame and Per-track Cues for Multi-person Action Recognition , 2012, ECCV.

[14] Yi Yang,et al. Articulated Human Detection with Flexible Mixtures of Parts , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[15] Subhransu Maji,et al. Knowing a Good HOG Filter When You See It: Efficient Selection of Filters for Detection , 2014, ECCV.

[16] Subhransu Maji,et al. Object segmentation by alignment of poselet activations to image contours , 2011, CVPR 2011.

[17] Dragomir Anguelov,et al. Capturing Long-Tail Distributions of Object Subcategories , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[18] Lior Rokach,et al. Ensemble-based classifiers , 2010, Artificial Intelligence Review.

[19] Martial Hebert,et al. Classifier Ensemble Recommendation , 2012, ECCV Workshops.

[20] Ali Farhadi,et al. Describing objects by their attributes , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[21] Jitendra Malik,et al. Multi-component Models for Object Detection , 2012, ECCV.

[22] Duan-Yu Chen,et al. Dynamic human crowd modeling and its application to anomalous events detcetion , 2010, 2010 IEEE International Conference on Multimedia and Expo.

[23] Derek Hoiem,et al. Learning Collections of Part Models for Object Recognition , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[24] Tim J. Ellis,et al. Learning semantic scene models from observing activity in visual surveillance , 2005, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[25] Thomas Hofmann,et al. Large Margin Methods for Structured and Interdependent Output Variables , 2005, J. Mach. Learn. Res..

[26] Serge J. Belongie,et al. Counting Crowded Moving Objects , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[27] Bo Wang,et al. Abnormal crowd behavior detection using high-frequency and spatio-temporal features , 2011, Machine Vision and Applications.

[28] Jason J. Corso,et al. Action bank: A high-level representation of activity in video , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[29] Joshua B. Tenenbaum,et al. Learning to share visual appearance for multiclass object detection , 2011, CVPR 2011.

[30] Yoram Singer,et al. Pegasos: primal estimated sub-gradient solver for SVM , 2011, Math. Program..

[31] Shaogang Gong,et al. Recognising action as clouds of space-time interest points , 2009, CVPR.

[32] Xiaogang Wang,et al. Random field topic model for semantic region analysis in crowded scenes from tracklets , 2011, CVPR 2011.

[33] Jitendra Malik,et al. Training Deformable Part Models with Decorrelated Features , 2013, 2013 IEEE International Conference on Computer Vision.

[34] Cordelia Schmid,et al. Dataset Issues in Object Recognition , 2006, Toward Category-Level Object Recognition.

[35] David A. McAllester,et al. Visual object detection with deformable part models , 2013, CACM.

[36] Luc Van Gool,et al. The Pascal Visual Object Classes (VOC) Challenge , 2010, International Journal of Computer Vision.

[37] Tal Hassner,et al. Violent flows: Real-time detection of violent crowd behavior , 2012, 2012 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[38] Andrew Zisserman,et al. Discriminative Sub-categorization , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[39] Cordelia Schmid,et al. Action recognition by dense trajectories , 2011, CVPR 2011.

[40] Alexei A. Efros,et al. Undoing the Damage of Dataset Bias , 2012, ECCV.

[41] Alexei A. Efros,et al. Mid-level Visual Element Discovery as Discriminative Mode Seeking , 2013, NIPS.

[42] Ivan Laptev,et al. On Space-Time Interest Points , 2005, International Journal of Computer Vision.

[43] Jitendra Malik,et al. Discriminative Decorrelation for Clustering and Classification , 2012, ECCV.

[44] Serge J. Belongie,et al. Behavior recognition via sparse spatio-temporal features , 2005, 2005 IEEE International Workshop on Visual Surveillance and Performance Evaluation of Tracking and Surveillance.

[45] Zhiwen Yu,et al. A Bayesian Model for Crowd Escape Behavior Detection , 2014, IEEE Transactions on Circuits and Systems for Video Technology.

[46] Silvio Savarese,et al. A Unified Framework for Multi-target Tracking and Collective Activity Recognition , 2012, ECCV.

[47] Soraia Raupp Musse,et al. Crowd Analysis Using Computer Vision Techniques , 2010, IEEE Signal Processing Magazine.

[48] Yang Wang,et al. Retrieving Actions in Group Contexts , 2010, ECCV Workshops.

[49] Martin A. Fischler,et al. The Representation and Matching of Pictorial Structures , 1973, IEEE Transactions on Computers.

[50] Alexei A. Efros,et al. Ensemble of exemplar-SVMs for object detection and beyond , 2011, 2011 International Conference on Computer Vision.

[51] Leonid Sigal,et al. Poselet Key-Framing: A Model for Human Activity Recognition , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[52] Andrew Zisserman,et al. An Exemplar Model for Learning Object Classes , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[53] Antonio Torralba,et al. Exploiting hierarchical context on a large database of object categories , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[54] Nuno Vasconcelos,et al. Anomaly detection in crowded scenes , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[55] Gian Luca Foresti,et al. Trajectory-Based Anomalous Event Detection , 2008, IEEE Transactions on Circuits and Systems for Video Technology.

[56] Alessandro Perina,et al. Abnormality Detection with Improved Histogram of Oriented Tracklets , 2015, ICIAP.

[57] Alexei A. Efros,et al. Object Instance Sharing by Enhanced Bounding Box Correspondence , 2012, BMVC.

[58] John Platt,et al. Probabilistic Outputs for Support vector Machines and Comparisons to Regularized Likelihood Methods , 1999 .

[59] Luc Van Gool,et al. An Efficient Dense and Scale-Invariant Spatio-Temporal Interest Point Detector , 2008, ECCV.

[60] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[61] Christian Bauckhage,et al. Loveparade 2010: Automatic video analysis of a crowd disaster , 2012, Comput. Vis. Image Underst..

[62] Greg Mori,et al. From Subcategories to Visual Composites: A Multi-level Framework for Object Detection , 2013, 2013 IEEE International Conference on Computer Vision.

[63] Mubarak Shah,et al. Learning motion patterns in crowded scenes using motion flow field , 2008, 2008 19th International Conference on Pattern Recognition.

[64] Robert Bergevin,et al. Semantic human activity recognition: A literature review , 2015, Pattern Recognit..

[65] Massimiliano Pontil,et al. Regularized multi--task learning , 2004, KDD.

[66] Helbing,et al. Social force model for pedestrian dynamics. , 1995, Physical review. E, Statistical physics, plasmas, fluids, and related interdisciplinary topics.

[67] Wander Jager,et al. Modelling Crowd dynamics, influence factors related to the probability of a riot , 2007 .

[68] Larry S. Davis,et al. A flow model for joint action recognition and identity maintenance , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[69] Daniel P. Huttenlocher,et al. Pictorial Structures for Object Recognition , 2004, International Journal of Computer Vision.

[70] Ming Yang,et al. Regionlets for Generic Object Detection , 2013, 2013 IEEE International Conference on Computer Vision.

[71] Ali Farhadi,et al. Learning Everything about Anything: Webly-Supervised Visual Concept Learning , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[72] Pietro Perona,et al. Learning Generative Visual Models from Few Training Examples: An Incremental Bayesian Approach Tested on 101 Object Categories , 2004, 2004 Conference on Computer Vision and Pattern Recognition Workshop.

[73] Carlo Tomasi,et al. Good features to track , 1994, 1994 Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.

[74] Bill Triggs,et al. Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[75] Mubarak Shah,et al. Chaotic invariants of Lagrangian particle trajectories for anomaly detection in crowded scenes , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[77] Hao Su,et al. Object Bank: A High-Level Image Representation for Scene Classification & Semantic Feature Sparsification , 2010, NIPS.

[78] Subhransu Maji,et al. Action recognition from a distributed representation of pose and appearance , 2011, CVPR 2011.

[79] David A. McAllester,et al. Object Detection with Discriminatively Trained Part Based Models , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[80] Yali Amit,et al. Object Detection , 2020, Computer Vision, A Reference Guide.

[81] Cordelia Schmid,et al. Learning realistic human actions from movies , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[82] W. Eric L. Grimson,et al. Unsupervised Activity Perception in Crowded and Complicated Scenes Using Hierarchical Bayesian Models , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[83] Stefan Carlsson,et al. Mixture Component Identification and Learning for Visual Recognition , 2012, ECCV.

[84] Ioannis Tsochantaridis,et al. Support Vector Machines for Multi ple-Instance Learning , 2002 .

[85] Martial Hebert, Co-chair , 2002 .

[86] Stephen P. Boyd,et al. Convex piecewise-linear fitting , 2009 .

[87] Martial Hebert, Co-chair , 2002 .

[88] Hichem Snoussi,et al. Histograms of Optical Flow Orientation for Visual Abnormal Events Detection , 2012, 2012 IEEE Ninth International Conference on Advanced Video and Signal-Based Surveillance.

[89] Xinlei Chen,et al. Enriching Visual Knowledge Bases via Object Discovery and Segmentation , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[90] Ramakant Nevatia,et al. Bayesian human segmentation in crowded situations , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[91] C. V. Jawahar,et al. Blocks That Shout: Distinctive Parts for Scene Classification , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[92] Ko Nishino,et al. Tracking with local spatio-temporal motion patterns in extremely crowded scenes , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[93] Silvio Savarese,et al. What are they doing? : Collective activity classification using spatio-temporal relationship among people , 2009, 2009 IEEE 12th International Conference on Computer Vision Workshops, ICCV Workshops.

[94] M.H. Sharif,et al. Crowd behaviour monitoring on the escalator exits , 2008, 2008 11th International Conference on Computer and Information Technology.

[95] Stefano Soatto,et al. Tracklet Descriptors for Action Modeling and Video Analysis , 2010, ECCV.

[96] Alexei A. Efros,et al. Unsupervised Discovery of Mid-Level Discriminative Patches , 2012, ECCV.

[97] Mubarak Shah,et al. Abnormal crowd behavior detection using social force model , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[98] Junsong Yuan,et al. Sparse reconstruction cost for abnormal event detection , 2011, CVPR 2011.

[99] Deva Ramanan,et al. Face detection, pose estimation, and landmark localization in the wild , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[100] Subhransu Maji,et al. Detecting People Using Mutually Consistent Poselet Activations , 2010, ECCV.

[101] Alessandro Perina,et al. Crowd motion monitoring using tracklet-based commotion measure , 2015, 2015 IEEE International Conference on Image Processing (ICIP).

[102] Xinlei Chen,et al. NEIL: Extracting Visual Knowledge from Web Data , 2013, 2013 IEEE International Conference on Computer Vision.

[103] Tieniu Tan,et al. A system for learning statistical motion patterns , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[104] Qixiang Ye,et al. Human Detection in Images via Piecewise Linear Support Vector Machines , 2013, IEEE Transactions on Image Processing.

[105] Trevor Darrell,et al. What you saw is not what you get: Domain adaptation using asymmetric kernel transforms , 2011, CVPR 2011.

[106] Kristen Grauman,et al. Observe locally, infer globally: A space-time MRF for detecting abnormal activities with incremental updates , 2009, CVPR.

[107] David A. McAllester,et al. Object Detection with Grammar Models , 2011, NIPS.

[108] Massimiliano Pontil,et al. Learning with dataset bias in latent subcategory models , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[109] Lu Yong,et al. Video-Based Detection of Abnormal Behavior in the Examination Room , 2010, 2010 International Forum on Information Technology and Applications.

[110] Andrei Zaharescu,et al. Anomalous Behaviour Detection Using Spatiotemporal Oriented Energies, Subset Inclusion Histogram Comparison and Event-Driven Processing , 2010, ECCV.

[111] Shaogang Gong,et al. Scene Segmentation for Behaviour Correlation , 2008, ECCV.

[112] Ali Farhadi,et al. Attribute-centric recognition for cross-category generalization , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[113] R. Fisher. THE USE OF MULTIPLE MEASUREMENTS IN TAXONOMIC PROBLEMS , 1936 .

[114] J.K. Aggarwal,et al. Human activity analysis , 2011, ACM Comput. Surv..

[115] Trevor Darrell,et al. Discovering Latent Domains for Multisource Domain Adaptation , 2012, ECCV.

[116] Peter H. Tu,et al. Simultaneous estimation of segmentation and shape , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[117] Svetlana Lazebnik,et al. Scene recognition and weakly supervised object localization with deformable part-based models , 2011, 2011 International Conference on Computer Vision.

[118] Alessio Del Bue,et al. Temporal Poselets for Collective Activity Detection and Recognition , 2013, 2013 IEEE International Conference on Computer Vision Workshops.

[119] Louis Kratz,et al. Anomaly detection in extremely crowded scenes using spatio-temporal motion pattern models , 2009, CVPR.

[120] Jorge S. Marques,et al. Tracking Groups of Pedestrians in Video Sequences , 2003, 2003 Conference on Computer Vision and Pattern Recognition Workshop.

[121] Christian Bauckhage,et al. Analyzing pedestrian behavior in crowds for automatic detection of congestions , 2011, 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops).

[122] Leonidas J. Guibas,et al. Human action recognition by learning bases of action attributes and parts , 2011, 2011 International Conference on Computer Vision.

[123] Alessandro Sperduti,et al. Multiclass Classification with Multi-Prototype Support Vector Machines , 2005, J. Mach. Learn. Res..

[124] Stephen P. Boyd,et al. Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.

[125] Silvio Savarese,et al. Learning context for collective activity recognition , 2011, CVPR 2011.

[126] Alexei A. Efros,et al. How Important Are "Deformable Parts" in the Deformable Parts Model? , 2012, ECCV Workshops.

[127] Mubarak Shah,et al. Identifying Behaviors in Crowd Scenes Using Stability Analysis for Dynamical Systems , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[128] Matthijs C. Dorst. Distinctive Image Features from Scale-Invariant Keypoints , 2011 .

[129] Alessandro Perina,et al. Analyzing Tracklets for the Detection of Abnormal Crowd Behavior , 2015, 2015 IEEE Winter Conference on Applications of Computer Vision.

[130] Shaogang Gong,et al. Global Behaviour Inference using Probabilistic Latent Semantic Analysis , 2008, BMVC.

[131] Yang Wang,et al. Discriminative Latent Models for Recognizing Contextual Group Activities , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[132] Trevor Darrell,et al. Adapting Visual Category Models to New Domains , 2010, ECCV.

[133] Lior Wolf,et al. Local Trinary Patterns for human action recognition , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[134] Alexei A. Efros,et al. Unbiased look at dataset bias , 2011, CVPR 2011.

[135] Mubarak Shah,et al. A 3-dimensional sift descriptor and its application to action recognition , 2007, ACM Multimedia.

[136] Alexei A. Efros,et al. Scene Semantics from Long-Term Observation of People , 2012, ECCV.

[137] Li Fei-Fei,et al. ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[138] Quoc V. Le,et al. Learning hierarchical invariant spatio-temporal features for action recognition with independent subspace analysis , 2011, CVPR 2011.

[139] Cordelia Schmid,et al. Evaluation of Local Spatio-temporal Features for Action Recognition , 2009, BMVC.

[140] Shaogang Gong,et al. A Markov Clustering Topic Model for mining behaviour in video , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[141] Junseok Kwon,et al. A unified framework for event summarization and rare event detection , 2012, CVPR.

[142] Cordelia Schmid,et al. A Spatio-Temporal Descriptor Based on 3D-Gradients , 2008, BMVC.

[143] Michael I. Jordan,et al. Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[144] Thomas Deselaers,et al. Visual and semantic similarity in ImageNet , 2011, CVPR 2011.

[145] Antonio Torralba,et al. LabelMe: A Database and Web-Based Tool for Image Annotation , 2008, International Journal of Computer Vision.

[146] Christoph H. Lampert,et al. Learning to detect unseen object classes by between-class attribute transfer , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.