论文信息 - Event modelling and recognition in video

Event modelling and recognition in video

The management of digital video has become a very challenging problem as the amount of video content continues to witness phenomenal growth. This trend necessitates the development of advanced techniques for the efficient and effective manipulation of video information. However, the performance of current video processing tools has not yet reached the required satisfaction levels mainly due to the gap between the computer generated semantic descriptions of video content and the interpretations of the same content by humans, a discrepancy commonly referred to as the semantic gap. Inspired from recent studies in neuroscience suggesting that humans remember real life using past experience structured in events, in this thesis we investigate the use of appropriate models and machine learning approaches for representing and recognizing events in video. Specifically, a joint contentevent model is proposed for describing video content (e.g., shots, scenes, etc.), as well as real-life events (e.g., demonstration, birthday party, etc.) and their key semantic entities (participants, location, etc.). In the core of this model stands a referencing mechanism which utilizes a set of video analysis algorithms for the automatic generation of event model instances and their enrichment with semantic information extracted from the video content. In particular, a set of subclass discriminant analysis and support vector machine methods for handling data nonlinearities and addressing several limitations of the current state-of-the-art approaches are proposed. These approaches are evaluated using several publicly available benchmarks particularly suited for testing the robustness and reliability of nonlinear classification methods, such as the facial image collection of the Four Face database, datasets from the UCI repository, and other. Moreover, the most efficient of the proposed methods are additionally evaluated using a largescale video collection, consisting of the datasets provided in TRECVID multimedia event detection (MED) track of 2010 and 2011, which are among the most challenging in this field, for the tasks of event detection and event

Nikolaos Gkalelis | Nikolaos Gkalelis

[1] Sergio Escalera,et al. Subclass Problem-Dependent Design for Error-Correcting Output Codes , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[2] Chih-Jen Lin,et al. LIBLINEAR: A Library for Large Linear Classification , 2008, J. Mach. Learn. Res..

[3] Gunnar Rätsch,et al. Soft Margins for AdaBoost , 2001, Machine Learning.

[4] Sergio Escalera,et al. An incremental node embedding technique for error correcting output codes , 2008, Pattern Recognit..

[5] Jeffrey M. Zacks,et al. Event structure in perception and conception. , 2001, Psychological bulletin.

[6] Cordelia Schmid,et al. Action recognition by dense trajectories , 2011, CVPR 2011.

[7] Thomas Sikora,et al. The MPEG-7 visual standard for content description-an overview , 2001, IEEE Trans. Circuits Syst. Video Technol..

[8] Alexander G. Hauptmann,et al. MoSIFT: Recognizing Human Actions in Surveillance Videos , 2009 .

[9] Dave Kolas,et al. Enabling the geospatial Semantic Web with Parliament and GeoSPARQL , 2012, Semantic Web.

[10] Hsuan-Tien Lin,et al. A note on Platt’s probabilistic outputs for support vector machines , 2007, Machine Learning.

[11] John Platt,et al. Probabilistic Outputs for Support vector Machines and Comparisons to Regularized Likelihood Methods , 1999 .

[12] Aristodemos Pnevmatikakis,et al. Subclass linear discriminant analysis for video-based face recognition , 2009, J. Vis. Commun. Image Represent..

[13] A. Elisseeff,et al. A comparative study of multi-class support vector machines in the unifying framework of large margin classifiers , 2005 .

[14] John R. Smith,et al. Large-scale concept ontology for multimedia , 2006, IEEE MultiMedia.

[15] D. Rubin,et al. Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[16] Rong Yan,et al. On predicting rare classes with SVM ensembles in scene classification , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[17] Georges Quénot,et al. TRECVID 2015 - An Overview of the Goals, Tasks, Data, Evaluation Mechanisms and Metrics , 2011, TRECVID.

[18] David Zhang,et al. Local Linear Discriminant Analysis Framework Using Sample Neighbors , 2011, IEEE Transactions on Neural Networks.

[19] Ramakant Nevatia,et al. VERL: An Ontology Framework for Representing and Annotating Video Events , 2005, IEEE Multim..

[20] Marcel Worring,et al. The challenge problem for automated detection of 101 semantic concepts in multimedia , 2006, MM '06.

[21] Aleix M. Martínez,et al. Kernel Optimization in Discriminant Analysis , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[22] D. Lee,et al. Linear Discriminant Analysis for Signatures , 2010, IEEE Transactions on Neural Networks.

[23] Robert A. Jacobs,et al. Hierarchical Mixtures of Experts and the EM Algorithm , 1993, Neural Computation.

[24] Raphaël Troncy,et al. MPEG-7 based Multimedia Ontologies: Interoperability Support or Interoperability Issue? , 2007 .

[25] D. B. Gerham. Characterizing virtual eigensignatures for general purpose face recognition , 1998 .

[26] Carlos E. Thomaz,et al. A new covariance estimate for Bayesian classifiers in biometric recognition , 2004, IEEE Transactions on Circuits and Systems for Video Technology.

[27] Raphaël Troncy. Bringing the IPTC News Architecture into the Semantic Web , 2008, International Semantic Web Conference.

[28] Soo-Young Lee,et al. Discriminant Independent Component Analysis , 2011, IEEE Transactions on Neural Networks.

[29] Alper Yildirim,et al. An Alternative Model for Target Position Estimation in Radar Processors , 2007, IEEE Signal Processing Letters.

[30] Yoram Singer,et al. Reducing Multiclass to Binary: A Unifying Approach for Margin Classifiers , 2000, J. Mach. Learn. Res..

[31] Terence Sim,et al. The CMU Pose, Illumination, and Expression Database , 2003, IEEE Trans. Pattern Anal. Mach. Intell..

[32] Jean-François Condotta,et al. Spatial and temporal reasoning: beyond Allen's calculus , 2004, AI Commun..

[33] Martin Doerr,et al. The use of CRM Core in Multimedia Annotation , 2006 .

[34] Nicola Guarino,et al. Sweetening Ontologies with DOLCE , 2002, EKAW.

[35] Keinosuke Fukunaga,et al. Introduction to statistical pattern recognition (2nd ed.) , 1990 .

[36] Hui Xiong,et al. COG: local decomposition for rare class analysis , 2010, Data Mining and Knowledge Discovery.

[37] Mubarak Shah,et al. High-level event recognition in unconstrained videos , 2013, International Journal of Multimedia Information Retrieval.

[38] A. Sayadiyan,et al. A Fixed Dimension Modified Sinusoid Model (FD-MSM) for Single Microphone Sound Separation , 2007, 2007 IEEE International Conference on Signal Processing and Communications.

[39] Ramesh C. Jain,et al. Events in Multimedia Electronic Chronicles (E-Chronicles) , 2006, Int. J. Semantic Web Inf. Syst..

[40] Joydeep Ghosh,et al. A text retrieval approach to content-based audio retrieval , 2008 .

[41] Yiannis Kompatsiaris,et al. Enhancing video concept detection with the use of tomographs , 2013, 2013 IEEE International Conference on Image Processing.

[42] Olvi L. Mangasarian,et al. Nuclear feature extraction for breast tumor diagnosis , 1993, Electronic Imaging.

[43] Lei Wang,et al. Feature Selection with Kernel Class Separability , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[44] Bor-Chen Kuo,et al. Feature Extractions for Small Sample Size Classification Problem , 2007, IEEE Transactions on Geoscience and Remote Sensing.

[45] Andy Harter,et al. Parameterisation of a stochastic model for human face identification , 1994, Proceedings of 1994 IEEE Workshop on Applications of Computer Vision.

[46] Ehud Rivlin,et al. Understanding Video Events: A Survey of Methods for Automatic Interpretation of Semantic Occurrences in Video , 2009, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[47] Konstantinos N. Plataniotis,et al. Face recognition using kernel direct discriminant analysis algorithms , 2003, IEEE Trans. Neural Networks.

[48] Yiannis Kompatsiaris,et al. Video event detection using a subclass recoding error-correcting output codes framework , 2013, 2013 IEEE International Conference on Multimedia and Expo (ICME).

[49] Trevor J. Hastie,et al. Sparse Discriminant Analysis , 2011, Technometrics.

[50] Wendy Hall,et al. The Semantic Web Revisited , 2006, IEEE Intelligent Systems.

[51] Steffen Staab,et al. COMM: Designing a Well-Founded Multimedia Ontology for the Web , 2007, ISWC/ASWC.

[52] Ramesh Jain,et al. Toward a Common Event Model for Multimedia Applications , 2007, IEEE MultiMedia.

[53] Martin Doerr,et al. The CIDOC CRM, an Ontological Approach to Schema Heterogeneity , 2005, Semantic Interoperability and Integration.

[54] Alberto Del Bimbo,et al. Dynamic Pictorially Enriched Ontologies for Digital Video Libraries , 2009, IEEE MultiMedia.

[55] Gerald Friedland,et al. Acoustic super models for large scale video event detection , 2011, J-MRE '11.

[56] Christopher M. Bishop,et al. Pattern Recognition and Machine Learning (Information Science and Statistics) , 2006 .

[57] Fengxi Song,et al. Feature Selection Based on Linear Discriminant Analysis , 2010, 2010 International Conference on Intelligent System Design and Engineering Application.

[58] Shuicheng Yan,et al. Multiplicative nonnegative graph embedding , 2009, CVPR.

[59] Farid Oveisi. Subclass discriminant analysis using dynamic cluster formation for EEG-based brain-computer interface , 2009, 2009 4th International IEEE/EMBS Conference on Neural Engineering.

[60] Q. Mcnemar. Note on the sampling error of the difference between correlated proportions or percentages , 1947, Psychometrika.

[61] Yuxiao Hu,et al. Learning a Spatially Smooth Subspace for Face Recognition , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[62] Aleix M. Martínez,et al. Subclass discriminant analysis , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[63] Yiannis Kompatsiaris,et al. High-level event detection system based on discriminant visual concepts , 2011, ICMR '11.

[64] Chih-Jen Lin,et al. LIBSVM: A library for support vector machines , 2011, TIST.

[65] Antonella Carbonaro. Ontology-based Video Retrieval in a Semantic-based Learning Environment , 2009 .

[66] Gang Hua,et al. Semantic Model Vectors for Complex Video Event Recognition , 2012, IEEE Transactions on Multimedia.

[67] Kim L. Boyer,et al. Resilient Subclass Discriminant Analysis , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[68] Pradeep K. Atrey,et al. A hierarchical model for representation of events in multimedia observation systems , 2009, EiMM '09.

[69] Anthony G. Cohn,et al. Qualitative Spatial Representation and Reasoning with the Region Connection Calculus , 1997, GeoInformatica.

[70] Koby Crammer,et al. On the Algorithmic Implementation of Multiclass Kernel-based Vector Machines , 2002, J. Mach. Learn. Res..

[71] Martine De Cock,et al. Spatial reasoning in a fuzzy region connection calculus , 2009, Artif. Intell..

[72] Koen E. A. van de Sande,et al. Evaluating Color Descriptors for Object and Scene Recognition , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[73] Dong Liu,et al. BBN VISER TRECVID 2011 Multimedia Event Detection System , 2011, TRECVID.

[74] Judy Pearsall,et al. New Oxford dictionary of English , 2001 .

[75] Ravi Kothari,et al. Fractional-Step Dimensionality Reduction , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[76] Michael G. Strintzis,et al. Knowledge-assisted semantic video object detection , 2005, IEEE Transactions on Circuits and Systems for Video Technology.

[77] Anastasios Tefas,et al. Combining Fuzzy Vector Quantization With Linear Discriminant Analysis for Continuous Human Movement Recognition , 2008, IEEE Transactions on Circuits and Systems for Video Technology.

[78] Vladimir Vapnik,et al. Statistical learning theory , 1998 .

[79] Sergio Escalera,et al. On the Decoding Process in Ternary Error-Correcting Output Codes , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[80] Yunqian Ma,et al. Practical selection of SVM parameters and noise estimation for SVM regression , 2004, Neural Networks.

[81] Kristin P. Bennett,et al. Multicategory Classification by Support Vector Machines , 1999, Comput. Optim. Appl..

[82] Aleix M. Martínez,et al. Where are linear feature extraction methods applicable? , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[83] Jeffrey M. Zacks,et al. Human brain activity time-locked to perceptual event boundaries , 2001, Nature Neuroscience.

[84] Chin-Hui Lee,et al. Explicit Performance Metric Optimization for Fusion-Based Video Retrieval , 2012, ECCV Workshops.

[85] Setareh Rafatirad,et al. Event composition operators: ECO , 2009, EiMM '09.

[86] Jane Hunter,et al. The ABC Ontology and Model , 2001, J. Digit. Inf..

[87] Luc Van Gool,et al. Speeded-Up Robust Features (SURF) , 2008, Comput. Vis. Image Underst..

[88] N. L. Johnson,et al. Multivariate Analysis , 1958, Nature.

[89] Dahua Lin,et al. Nonparametric Discriminant Analysis for Face Recognition , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[90] Yiannis Kompatsiaris,et al. Automatic event-based indexing of multimedia content using a joint content-event model , 2010, EiMM '10.

[91] N. Campbell. CANONICAL VARIATE ANALYSIS—A GENERAL MODEL FORMULATION , 1984 .

[92] Yiannis Kompatsiaris,et al. High-level event detection in video exploiting discriminant concepts , 2011, 2011 9th International Workshop on Content-Based Multimedia Indexing (CBMI).

[93] Yiannis Kompatsiaris,et al. Local Invariant Feature Tracks for high-level video feature extraction , 2010, 11th International Workshop on Image Analysis for Multimedia Interactive Services WIAMIS 10.

[94] Dmitriy Fradkin,et al. Clustering Inside Classes Improves Performance of Linear Classifiers , 2008, 2008 20th IEEE International Conference on Tools with Artificial Intelligence.

[95] David G. Stork,et al. Pattern Classification (2nd ed.) , 1999 .

[96] Misha Wolf,et al. Date and Time Formats , 1997 .

[97] B. S. Manjunath,et al. Color and texture descriptors , 2001, IEEE Trans. Circuits Syst. Video Technol..

[98] Mark B. Sandler,et al. The Music Ontology , 2007, ISMIR.

[99] Marcel Worring,et al. Concept-Based Video Retrieval , 2009, Found. Trends Inf. Retr..

[100] Daoqiang Zhang,et al. Efficient and robust feature extraction by maximum margin criterion , 2003, IEEE Transactions on Neural Networks.

[101] Chih-Jen Lin,et al. A sequential dual method for large scale multi-class linear svms , 2008, KDD.

[102] G. McLachlan,et al. The EM algorithm and extensions , 1996 .

[103] José María Martínez Sanchez. MPEG-7: Overview of MPEG-7 Description Tools, Part 2 , 2002, IEEE Multim..

[104] Yariv Ephraim,et al. A signal subspace approach for speech enhancement , 1995, IEEE Trans. Speech Audio Process..

[105] Yiannis Kompatsiaris,et al. Mixture Subclass Discriminant Analysis Link to Restricted Gaussian Model and Other Generalizations , 2013, IEEE Transactions on Neural Networks and Learning Systems.

[106] Samy Bengio,et al. Large-scale content-based audio retrieval from text queries , 2008, MIR '08.

[107] Jane Hunter,et al. An overview of the MPEG-7 description definition language (DDL) , 2001, IEEE Trans. Circuits Syst. Video Technol..

[108] Raphaël Troncy,et al. Towards a simplification of COMM-based multimedia annotations , 2008 .

[109] Kari Torkkola,et al. Feature Extraction by Non-Parametric Mutual Information Maximization , 2003, J. Mach. Learn. Res..

[110] Heng Tao Shen,et al. Principal Component Analysis , 2009, Encyclopedia of Biometrics.

[111] Steffen Staab,et al. Semantic Multimedia , 2008, Reasoning Web.

[112] Lexing Xie,et al. Event Mining in Multimedia Streams , 2008, Proceedings of the IEEE.

[113] Jieping Ye,et al. Generalized Linear Discriminant Analysis: A Unified Framework and Efficient Model Selection , 2008, IEEE Transactions on Neural Networks.

[114] Martin Doerr,et al. The CIDOC Conceptual Reference Model - A New Standard for Knowledge Sharing , 2007, ER.

[115] John R. Smith,et al. Multimedia semantic indexing using model vectors , 2003, 2003 International Conference on Multimedia and Expo. ICME '03. Proceedings (Cat. No.03TH8698).

[116] Nikos A. Vlassis,et al. A kurtosis-based dynamic approach to Gaussian mixture modeling , 1999, IEEE Trans. Syst. Man Cybern. Part A.

[117] Marcel Worring,et al. Content-Based Image Retrieval at the End of the Early Years , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[118] Steffen Staab,et al. F--a model of events based on the foundational ontology dolce+DnS ultralight , 2009, K-CAP '09.

[119] Stephen E. Robertson,et al. A new interpretation of average precision , 2008, SIGIR '08.

[120] Sergio Escalera,et al. Recoding Error-Correcting Output Codes , 2009, MCS.

[121] Xuelong Li,et al. Geometric Mean for Subspace Selection , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[122] Robert P. W. Duin,et al. On Using a Pre-clustering Technique to Optimize LDA-Based Classifiers for Appearance-Based Face Recognition , 2007, CIARP.

[123] Konstantinos N. Plataniotis,et al. Regularization studies of linear discriminant analysis in small sample size scenarios with application to face recognition , 2005, Pattern Recognit. Lett..

[124] Geoffrey E. Hinton,et al. Adaptive Mixtures of Local Experts , 1991, Neural Computation.

[125] Jiri Matas,et al. On Combining Classifiers , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[126] Yiannis Kompatsiaris,et al. Mixture Subclass Discriminant Analysis , 2011, IEEE Signal Processing Letters.

[127] Michael G. Strintzis,et al. A System for the Semantic Multimodal Analysis of News Audio-Visual Content , 2010, EURASIP J. Adv. Signal Process..

[128] Lynda Hardman,et al. That Obscure Object of Desire: Multimedia Metadata on the Web, Part 1 , 2004, IEEE Multim..

[129] Raphaël Troncy,et al. LODE: Linking Open Descriptions of Events , 2009, ASWC.

[130] Anil K. Jain,et al. Statistical Pattern Recognition: A Review , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[131] John R. Smith,et al. MPEG-7 multimedia description schemes , 2001, IEEE Trans. Circuits Syst. Video Technol..

[132] Michael G. Strintzis,et al. Statistical Motion Information Extraction and Representation for Semantic Video Analysis , 2009, IEEE Transactions on Circuits and Systems for Video Technology.

[133] G. Stewart,et al. An Algorithm for Generalized Matrix Eigenvalue Problems. , 1973 .

[134] R. Tibshirani,et al. Discriminant Analysis by Gaussian Mixtures , 1996 .

[135] K. Nelson,et al. Event knowledge : structure and function in development , 1986 .

[136] Jordi Vitrià,et al. Discriminant ECOC: a heuristic method for application dependent design of error correcting output codes , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[137] Stefanos Zafeiriou,et al. Regularized Kernel Discriminant Analysis With a Robust Kernel for Face Recognition and Verification , 2012, IEEE Transactions on Neural Networks and Learning Systems.

[138] Zheng Bao,et al. Kernel subclass discriminant analysis , 2007, Neurocomputing.

[139] Sang-Woon Kim. A pre-clustering technique for optimizing subclass discriminant analysis , 2010, Pattern Recognit. Lett..

[140] Anastasios Tefas,et al. Optimizing Linear Discriminant Error Correcting Output Codes Using Particle Swarm Optimization , 2011, ICANN.

[141] Rama Chellappa,et al. Machine Recognition of Human Activities: A Survey , 2008, IEEE Transactions on Circuits and Systems for Video Technology.

[142] Peter Scheuermann,et al. Active Database Systems , 2008, Wiley Encyclopedia of Computer Science and Engineering.

[143] Robert P. W. Duin,et al. Multiclass Linear Dimension Reduction by Weighted Pairwise Fisher Criteria , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[144] K. Nelson. The Psychological and Social Origins of Autobiographical Memory , 1993 .

[145] Narendra Ahuja,et al. Face Detection Using Multimodal Density Models , 2001, Comput. Vis. Image Underst..

[146] William R. Hersh,et al. Managing Gigabytes—Compressing and Indexing Documents and Images (Second Edition) , 2001, Information Retrieval.

[147] David J. Kriegman,et al. Acquiring linear subspaces for face recognition under variable lighting , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[148] Peter Eberhardt. ISO 101:A SAS Æ Guide to International Dating , 2013 .

[149] Amarnath Gupta,et al. Managing Event Information: Modeling, Retrieval, and Applications , 2011, Managing Event Information.

[150] László Györfi,et al. A Probabilistic Theory of Pattern Recognition , 1996, Stochastic Modelling and Applied Probability.

[151] Gunnar Rätsch,et al. An introduction to kernel-based learning algorithms , 2001, IEEE Trans. Neural Networks.

[152] A. Murat Tekalp,et al. Integrated semantic-syntactic video modeling for search and browsing , 2004, IEEE Transactions on Multimedia.

[153] Steffen Staab,et al. A model of events based on a foundational ontology , 2009 .

[154] Anthony G. Cohn,et al. A Spatial Logic based on Regions and Connection , 1992, KR.

[155] James F. Allen. Maintaining knowledge about temporal intervals , 1983, CACM.

[156] Lynda Hardman,et al. That obscure object of desire: multimedia metadata on the Web, Part-1 , 2004, IEEE MultiMedia.

[157] Yiannis Kompatsiaris,et al. ITI-CERTH participation to TRECVID 2015 , 2015, TRECVID.

[158] Jun Zhou,et al. Mixing Linear SVMs for Nonlinear Classification , 2010, IEEE Transactions on Neural Networks.

[159] Cor J. Veenman,et al. Visual Word Ambiguity , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[160] K. Mardia. Measures of multivariate skewness and kurtosis with applications , 1970 .

[161] Alexander C. Loui,et al. Detecting Significant Events in Personal Image Collections , 2009, 2009 IEEE International Conference on Semantic Computing.

[162] Jonathan Foote,et al. An overview of audio information retrieval , 1999, Multimedia Systems.

[163] Ivan Laptev,et al. On Space-Time Interest Points , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[164] Frank Nack,et al. Everything You Wanted to Know About MPEG-7: Part 1 , 1999, IEEE Multim..

[165] Yiannis Kompatsiaris,et al. Linear Subclass Support Vector Machines , 2012, IEEE Signal Processing Letters.

[166] David J. Kriegman,et al. Eigenfaces vs. Fisherfaces: Recognition Using Class Specific Linear Projection , 1996, ECCV.

[167] Brian Antonishek. TRECVID 2010 – An Introduction to the Goals , Tasks , Data , Evaluation Mechanisms , and Metrics , 2010 .

[168] Ramesh Jain,et al. Event-centric media management , 2008, Electronic Imaging.

[169] N. Brown. On The Prevalence of Event Clusters in Autobiographical Memory , 2005 .

[170] Yiannis Kompatsiaris,et al. A Joint Content-Event Model for Event-Centric Multimedia Indexing , 2010, 2010 IEEE Fourth International Conference on Semantic Computing.

[171] Bernt Schiele,et al. Analyzing appearance and contour based methods for object categorization , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..