Visual Feature Learning

Categorization is a fundamental problem of many computer vision applications, e.g., image classification, pedestrian detection and face recognition. The robustness of a categorization system heavily relies on the quality of features, by which data are represented. The prior arts of feature extraction can be concluded in different levels, which, in a bottom up order, are low level features (e.g., pixels and gradients) and middle/high-level features (e.g., the BoW model and sparse coding). Low level features can be directly extracted from images or videos, while middle/high-level features are constructed upon low-level features, and are designed to enhance the capability of categorization systems based on different considerations (e.g., guaranteeing the domain-invariance and improving the discriminative power). This thesis focuses on the study of visual feature learning. Challenges that remain in designing visual features lie in intra-class variation, occlusions, illumination and view-point changes and insufficient prior knowledge. To address these challenges, I present several visual feature learning methods, where these methods cover the following sub-topics: (i) I start by introducing a segmentation-based object recognition system. (ii) When training data are insufficient, I seek data from other resources, which include images or videos in a different domain, actions captured from a different viewpoint and information in a different media form. In order to appropriately transfer such resources into the target categorization system, four transfer learning-based feature learning methods are presented in this section, where both cross-view, cross-domain and cross-modality scenarios are addressed accordingly. (iii) Finally, I present a random-forest based feature fusion method for multi-view action recognition.

[1]  Sameer A. Nene,et al.  Columbia Object Image Library (COIL100) , 1996 .

[2]  Narendra Ahuja,et al.  Learning to recognize objects , 2000, Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662).

[3]  Michael F. Young,et al.  Imagery, action, and young children's spatial orientation: it's not being there that counts, it's what one has in mind. , 1994, Child development.

[4]  M S Banks,et al.  Sensitive period for the development of human binocular vision , 1975, Science.

[5]  Stanley M. Dunn,et al.  Learning Shape Classes , 1994, IEEE Trans. Pattern Anal. Mach. Intell..

[6]  James L. Crowley,et al.  Visual Recognition Using Local Appearance , 1998, ECCV.

[7]  Yali Amit,et al.  Joint Induction of Shape Features and Tree Classifiers , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[8]  Keinosuke Fukunaga,et al.  Application of the Karhunen-Loève Expansion to Feature Selection and Ordering , 1970, IEEE Trans. Computers.

[9]  Roderic A. Grupen,et al.  A control basis for learning multifingered grasps , 1997, J. Field Robotics.

[10]  Roderic A. Grupen,et al.  Learning in Non-stationary Conditions: A Control Theoretic Approach , 2000, ICML.

[11]  Paul R. Cohen,et al.  Neo: learning conceptual knowledge by sensorimotor interaction with an environment , 1997, AGENTS '97.

[12]  David J. Field,et al.  Emergence of simple-cell receptive field properties by learning a sparse code for natural images , 1996, Nature.

[13]  Bartlett W. Mel SEEMORE: Combining Color, Shape, and Texture Histogramming in a Neurally Inspired Approach to Visual Object Recognition , 1997, Neural Computation.

[14]  James L. Crowley,et al.  Probabilistic recognition of activity using local appearance , 1999, Proceedings. 1999 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No PR00149).

[15]  Horst Bischof,et al.  Robust Recognition Using Eigenimages , 2000, Comput. Vis. Image Underst..

[16]  X. Beristain Essentials of neural science and behavior , 1996 .

[17]  Donald Geman,et al.  Graded Learning for Object Detection , 1999 .

[18]  Hiroshi Murase,et al.  Learning, positioning, and tracking visual appearance , 1994, Proceedings of the 1994 IEEE International Conference on Robotics and Automation.

[19]  Vicki Bruce,et al.  Face Recognition: From Theory to Applications , 1999 .

[20]  Gérard Govaert,et al.  Assessing a Mixture Model for Clustering with the Integrated Completed Likelihood , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[21]  Roderic A. Grupen,et al.  A control basis for visual servoing tasks , 1995, Proceedings of 1995 IEEE International Conference on Robotics and Automation.

[22]  E. Gibson,et al.  The development of perception , 1983 .

[23]  J. Koenderink The structure of images , 2004, Biological Cybernetics.

[24]  J. Tanaka,et al.  Object categories and expertise: Is the basic level in the eye of the beholder? , 1991, Cognitive Psychology.

[25]  Peter Allen Surface descriptions from vision and touch , 1984, ICRA.

[26]  R. Manmatha,et al.  Gaussian Filtered Representations of Images , 1999 .

[27]  Lakhmi C. Jain,et al.  Introduction to Bayesian Networks , 2008 .

[28]  Rakesh Mohan,et al.  Multidimensional Indexing for Recognizing Visual Shapes , 1992, IEEE Trans. Pattern Anal. Mach. Intell..

[29]  Christopher M. Brown,et al.  Task-oriented vision with multiple Bayes nets , 1993 .

[30]  Shimon Ullman,et al.  Recognizing solid objects by alignment with an image , 1990, International Journal of Computer Vision.

[31]  Tomaso A. Poggio,et al.  A general framework for object detection , 1998, Sixth International Conference on Computer Vision (IEEE Cat. No.98CH36271).

[32]  Andrew McCallum,et al.  Reinforcement learning with selective perception and hidden state , 1996 .

[33]  P. Schyns,et al.  Categorization creates functional features , 1997 .

[34]  Hiroshi Murase,et al.  Subspace methods for robot vision , 1996, IEEE Trans. Robotics Autom..

[35]  Rajesh P. N. Rao,et al.  Dynamic Model of Visual Recognition Predicts Neural Response Properties in the Visual Cortex , 1997, Neural Computation.

[36]  Michael J. Tarr Is human object recognition better described by geon structural description or by multiple views , 1995 .

[37]  Nathan Intrator,et al.  Three-Dimensional Object Recognition Using an Unsupervised BCM Network: The Usefulness of Distinguishing Features , 1993, Neural Computation.

[38]  Rajesh P. N. Rao,et al.  An Active Vision Architecture Based on Iconic Representations , 1995, Artif. Intell..

[39]  Juyang Weng,et al.  Incremental learning for vision-based navigation , 1996, Proceedings of 13th International Conference on Pattern Recognition.

[40]  Jakub Segen Learning Graph Models of Shape , 1988, ML.

[41]  R. Nelson,et al.  Large-scale tests of a keyed, appearance-based 3-D object recognition system , 1998, Vision Research.

[42]  Edward M. Riseman,et al.  Image Retrieval Using Scale-Space Matching , 1996, ECCV.

[43]  Robert L. Goldstone,et al.  The development of features in object concepts , 1998, Behavioral and Brain Sciences.

[44]  Alex Pentland,et al.  Probabilistic Visual Learning for Object Representation , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[45]  Randal C. Nelson,et al.  Visual space task specification, planning and control , 1995, Proceedings of International Symposium on Computer Vision - ISCV.

[46]  Dana H. Ballard,et al.  Animate Vision , 1991, Artif. Intell..

[47]  Sang Joon Kim,et al.  A Mathematical Theory of Communication , 2006 .

[48]  Bruce A. Draper,et al.  ADORE: Adaptive Object Recognition , 1999, ICVS.

[49]  Yali Amit,et al.  Shape Quantization and Recognition with Randomized Trees , 1997, Neural Computation.

[50]  E. Gibson,et al.  An Ecological Approach to Perceptual Learning and Development , 2000 .

[51]  Michael J. Swain,et al.  Color indexing , 1991, International Journal of Computer Vision.

[52]  Lucas J. van Vliet,et al.  Recursive Gaussian derivative filters , 1998, Proceedings. Fourteenth International Conference on Pattern Recognition (Cat. No.98EX170).

[53]  Edward H. Adelson,et al.  The Design and Use of Steerable Filters , 1991, IEEE Trans. Pattern Anal. Mach. Intell..

[54]  Justus H. Piater,et al.  Distinctive Features Should Be Learned , 2000, Biologically Motivated Computer Vision.

[55]  J. Rieser,et al.  Pointing at objects in other rooms: young children's sensitivity to perspective after walking with and without vision. , 1988, Child development.

[56]  Peter Norvig,et al.  Artificial Intelligence: A Modern Approach , 1995 .

[57]  Justus H. Piater,et al.  Developing haptic and visual perceptual categories for reaching and grasping with a humanoid robot , 2001, Robotics Auton. Syst..

[58]  L. Acredolo,et al.  Behavioral Approaches to Spatial Orientation in Infancy , 1990, Annals of the New York Academy of Sciences.

[59]  James L. Crowley,et al.  Local Scale Selection for Gaussian Based Description Techniques , 2000, ECCV.

[60]  David G. Lowe,et al.  Towards a Computational Model for Object Recognition in IT Cortex , 2000, Biologically Motivated Computer Vision.

[61]  Tony Lindeberg,et al.  Feature Detection with Automatic Scale Selection , 1998, International Journal of Computer Vision.

[62]  J. Gibson The Ecological Approach to Visual Perception , 1979 .

[63]  Yali Amit,et al.  A Computational Model for Visual Selection , 1999, Neural Computation.

[64]  David Casasent,et al.  GENERAL METHODOLOGY FOR SIMULTANEOUS REPRESENTATION AND DISCRIMINATION OF MULTIPLE OBJECT CLASSES , 1998 .

[65]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[66]  M. Tarr Visual Pattern Recognition , 1998 .

[67]  Bernt Schiele,et al.  Object Recognition Using Multidimensional Receptive Field Histograms , 1996, ECCV.

[68]  Justus H. Piater,et al.  Toward learning visual discrimination strategies , 1999, Proceedings. 1999 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No PR00149).

[69]  M. Tarr,et al.  Becoming a “Greeble” Expert: Exploring Mechanisms for Face Recognition , 1997, Vision Research.

[70]  T. Poggio,et al.  Hierarchical models of object recognition in cortex , 1999, Nature Neuroscience.

[71]  Justus H. Piater,et al.  A Framework for Learning Visual Discrimination , 1999, FLAIRS.

[72]  Michael J. Black,et al.  EigenTracking: Robust Matching and Tracking of Articulated Objects Using a View-Based Representation , 1996, International Journal of Computer Vision.

[73]  Tony Lindeberg,et al.  Scale-Space Theory in Computer Vision , 1993, Lecture Notes in Computer Science.

[74]  R. L. Solso,et al.  Prototype formation of faces: A case of pseudo-memory , 1981 .

[75]  M. Turk,et al.  Eigenfaces for Recognition , 1991, Journal of Cognitive Neuroscience.

[76]  Norbert Krüger,et al.  ORASSYLL: Object Recognition with Autonomously Learned and Sparse Symbolic Representations Based on Local Line Detectors , 1998, BMVC.

[77]  D. Geman,et al.  Efficient Focusing and Face Detection , 1998 .

[78]  Soheil Shams Multiple elastic modules for visual pattern recognition , 1995, Neural Networks.

[79]  F. Yates Contributions to Mathematical Statistics , 1951, Nature.

[80]  P. Schyns,et al.  The Ontogeny of Part Representation in Object Concepts , 1994 .

[81]  D. Hubel,et al.  Receptive fields, binocular interaction and functional architecture in the cat's visual cortex , 1962, The Journal of physiology.

[82]  P. Utgoff,et al.  A Kolmogorov-Smirnoff Metric for Decision Tree Induction , 1996 .

[83]  Gérard G. Medioni,et al.  The Challenge of Generic Object Recognition , 1994, Object Representation in Computer Vision.

[84]  James L. Crowley,et al.  Object Recognition Using Coloured Receptive Fields , 2000, ECCV.

[85]  John W. Tukey,et al.  A Projection Pursuit Algorithm for Exploratory Data Analysis , 1974, IEEE Transactions on Computers.

[86]  Ruzena Bajcsy,et al.  Active and exploratory perception , 1992, CVGIP Image Underst..

[87]  Nicholas I. Fisher,et al.  Statistical Analysis of Circular Data , 1993 .

[88]  Sebastian Thrun,et al.  Lifelong robot learning , 1993, Robotics Auton. Syst..

[89]  Paul A. Viola Complex Feature Recognition: A Bayesian Approach for Learning to Recognize Objects , 1996 .

[90]  Juyang Weng,et al.  Using Discriminant Eigenfeatures for Image Retrieval , 1996, IEEE Trans. Pattern Anal. Mach. Intell..

[91]  Yoshua Bengio,et al.  Pattern Recognition and Neural Networks , 1995 .

[92]  J. Koenderink,et al.  Representation of local geometry in the visual system , 1987, Biological Cybernetics.

[93]  Judea Pearl,et al.  Probabilistic reasoning in intelligent systems - networks of plausible inference , 1991, Morgan Kaufmann series in representation and reasoning.

[94]  N. Logothetis,et al.  Psychophysical and physiological evidence for viewer-centered object representations in the primate. , 1995, Cerebral cortex.

[95]  Rajesh P. N. Rao,et al.  Embodiment is the foundation, not a level , 1996, Behavioral and Brain Sciences.

[96]  Roderic A. Grupen,et al.  Dynamic Control Models as State Abstractions , 1998 .

[97]  David Casasent,et al.  Classification and pose estimation of objects using nonlinear features , 1998, Defense, Security, and Sensing.

[98]  S. Nayar,et al.  Early Visual Learning , 1996 .

[99]  Luc Stells,et al.  Constructing and Sharing Perceptual Distiinctions , 1997, ECML.

[100]  R A Young,et al.  The Gaussian derivative model for spatial vision: I. Retinal mechanisms. , 1988, Spatial vision.

[101]  Yiannis Aloimonos,et al.  Active vision , 2004, International Journal of Computer Vision.

[102]  L P Acredolo,et al.  The role of self-produced movement and visual tracking in infant spatial orientation. , 1984, Journal of experimental child psychology.

[103]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[104]  John F. Canny,et al.  A Computational Approach to Edge Detection , 1986, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[105]  Justus H. Piater,et al.  Feature learning for recognition with Bayesian networks , 2000, Proceedings 15th International Conference on Pattern Recognition. ICPR-2000.

[106]  W. Eric L. Grimson,et al.  Model-based recognition and localization from tactile data , 1984, ICRA.

[107]  Juyang Weng,et al.  Vision-guided navigation using SHOSLIF , 1998, Neural Networks.

[108]  Tony Lindeberg,et al.  Edge Detection and Ridge Detection with Automatic Scale Selection , 1996, Proceedings CVPR IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[109]  Ramesh C. Jain,et al.  Recognizing partially visible objects using feature indexed hypotheses , 1986, IEEE J. Robotics Autom..

[110]  M. E. McCarty,et al.  How infants use vision for grasping objects. , 2001, Child development.

[111]  Andrea Salgian,et al.  A cubist approach to object recognition , 1998, Sixth International Conference on Computer Vision (IEEE Cat. No.98CH36271).

[112]  Ron Kohavi,et al.  Irrelevant Features and the Subset Selection Problem , 1994, ICML.

[113]  Andrew McCallum,et al.  Overcoming Incomplete Perception with Utile Distinction Memory , 1993, ICML.

[114]  Hiroshi Murase,et al.  Visual learning and recognition of 3-d objects from appearance , 2005, International Journal of Computer Vision.

[115]  Robert B. Fisher,et al.  Integrating Iconic and Structured Matching , 1998, ECCV.

[116]  Justus H. Piater,et al.  Constructive Feature Learning and the Development of Visual Expertise , 2000, ICML.

[117]  R. Manmatha,et al.  Retrieving images by appearance , 1998, Sixth International Conference on Computer Vision (IEEE Cat. No.98CH36271).

[118]  H. Ruff Infant recognition of the invariant form of objects. , 1978, Child development.

[119]  Luc Steels,et al.  Generation and Selection of Sensory Channels , 1999, EvoWorkshops.