(Unseen) event recognition via semantic compositionality

Since high-level events in images (e.g. “dinner”, “motorcycle stunt”, etc.) may not be directly correlated with their visual appearance, low-level visual features do not carry enough semantics to classify such events satisfactorily. This paper explores a fully compositional approach for event based image retrieval which is able to overcome this shortcoming. Furthermore, the approach is fully scalable in both adding new events and new primitives. Using the Pascal VOC 2007 dataset, our contributions are the following: (i) We apply the Faceted Analysis-Synthesis Theory (FAST) to build a hierarchy of 228 high-level events. (ii) We show that rule-based classifiers are better suited for compositional recognition of events than SVMs. In addition, rule-based classifiers provide semantically meaningful event descriptions which help bridging the semantic gap. (iii) We demonstrate that compositionality enables unseen event recognition: we can use rules learned from non-visual cues, together with object detectors to get reasonable performance on unseen event categories.

[1]  Willard Van Orman Quine,et al.  Word and Object , 1960 .

[2]  B. C. Vickery,et al.  Faceted classification schemes , 1966 .

[3]  Shiyali Ramamrita Ranganathan,et al.  Prolegomena to Library Classification , 1967 .

[4]  D. Davidson The Individuation of Events , 1969 .

[5]  R. G. Bennetts,et al.  Introduction to Switching Theory and Logical Design , 1981 .

[6]  Jean Aitchison Integration of thesauri in the social sciences , 1981 .

[7]  Carol E. Cleland On the individuation of events , 1991, Synthese.

[8]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[9]  N. Brown On The Prevalence of Event Clusters in Autobiographical Memory , 2005 .

[10]  Jianguo Zhang,et al.  The PASCAL Visual Object Classes Challenge , 2006 .

[11]  Luc Van Gool,et al.  The 2005 PASCAL Visual Object Classes Challenge , 2005, MLCW.

[12]  Cordelia Schmid,et al.  Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[13]  Frédéric Jurie,et al.  Fast Discriminative Visual Codebooks using Randomized Clustering Forests , 2006, NIPS.

[14]  J. Bennett What Events Are , 2007 .

[15]  Joachim M. Buhmann,et al.  Learning the Compositional Nature of Visual Objects , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[16]  Aida Slavic,et al.  Building a faceted classification for the humanities: principles and procedures , 2007, J. Documentation.

[17]  Fei-Fei Li,et al.  What, where and who? Classifying events by scene and object recognition , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[18]  Larry S. Davis,et al.  Beyond Nouns: Exploiting Prepositions and Comparative Adjectives for Learning Visual Classifiers , 2008, ECCV.

[19]  Subhransu Maji,et al.  Classification using intersection kernel support vector machines is efficient , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[20]  Christoph H. Lampert,et al.  Learning to detect unseen object classes by between-class attribute transfer , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[21]  Gilles Falquet,et al.  FlexIR: a Domain-Specific Information Retrieval System , 2009, Polytech. Open Libr. Int. Bull. Inf. Technol. Sci..

[22]  Charless C. Fowlkes,et al.  Discriminative Models for Multi-Class Object Layout , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[23]  Fausto Giunchiglia,et al.  Faceted Lightweight Ontologies , 2009, Conceptual Modeling: Foundations and Applications.

[24]  David A. McAllester,et al.  Object Detection with Discriminatively Trained Part Based Models , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[25]  Arnold W. M. Smeulders,et al.  Real-Time Visual Concept Classification , 2010, IEEE Transactions on Multimedia.

[26]  William T. Freeman,et al.  Latent hierarchical structural learning for object detection , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[27]  Koen E. A. van de Sande,et al.  Evaluating Color Descriptors for Object and Scene Recognition , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[28]  Gautam Das,et al.  Facetedpedia: dynamic generation of query-dependent faceted interfaces for wikipedia , 2010, WWW '10.

[29]  Ivan Laptev,et al.  Recognizing human actions in still images: a study of bag-of-features and part-based representations , 2010, BMVC.

[30]  Yang Wang,et al.  Recognizing human actions from still images with latent poses , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[31]  Matthijs C. Dorst Distinctive Image Features from Scale-Invariant Keypoints , 2011 .

[32]  Fausto Giunchiglia,et al.  DERA: A Faceted Knowledge Organization Framework , 2011 .

[33]  Ali Farhadi,et al.  Recognition using visual phrases , 2011, CVPR 2011.

[34]  Fausto Giunchiglia,et al.  Domains and context: First steps towards managing diversity in knowledge , 2012, J. Web Semant..