Ontological inference for image and video analysis

This paper presents an approach to designing and implementing extensible computational models for perceiving systems based on a knowledge-driven joint inference approach. These models can integrate different sources of information both horizontally (multi-modal and temporal fusion) and vertically (bottom–up, top–down) by incorporating prior hierarchical knowledge expressed as an extensible ontology.Two implementations of this approach are presented. The first consists of a content-based image retrieval system that allows users to search image databases using an ontological query language. Queries are parsed using a probabilistic grammar and Bayesian networks to map high-level concepts onto low-level image descriptors, thereby bridging the ‘semantic gap’ between users and the retrieval system. The second application extends the notion of ontological languages to video event detection. It is shown how effective high-level state and event recognition mechanisms can be learned from a set of annotated training sequences by incorporating syntactic and semantic constraints represented by an ontology.

[1]  C. Hookway,et al.  Minds, machines, and evolution : philosophical studies , 1984 .

[2]  Hilary Buxton,et al.  Query based visual analysis: spatio-temporal reasoning in computer vision , 1988, Image Vis. Comput..

[3]  King-Sun Fu,et al.  Attributed Grammar-A Tool for Combining Syntactic and Statistical Approaches to Pattern Recognition , 1980, IEEE Transactions on Systems, Man, and Cybernetics.

[4]  David A. Forsyth,et al.  Clustering art , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[5]  Beng Chin Ooi,et al.  Using Domain Knowledge in Querying Image Databases , 1996, MMM.

[6]  David A. Forsyth,et al.  Object Recognition as Machine Translation: Learning a Lexicon for a Fixed Image Vocabulary , 2002, ECCV.

[7]  Shaogang Gong,et al.  Continuous global evidence-based Bayesian modality fusion for simultaneous tracking of multiple objects , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[8]  Avi Pfeffer,et al.  Semantics and Inference for Recursive Probability Models , 2000, AAAI/IAAI.

[9]  Lakhmi C. Jain,et al.  Introduction to Bayesian Networks , 2008 .

[10]  H. Bunke,et al.  PARSING MULTIVALUED STRINGS AND ITS APPLICATION TO IMAGE AND WAVEFORM RECOGNITION , 1990 .

[11]  Aleksandra Mojsilovic,et al.  ISee: perceptual features for image library navigation , 2002, IS&T/SPIE Electronic Imaging.

[12]  David Sinclair Smooth region structure: folds, domes, bowls, ridges, valleys and slopes , 2000, Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662).

[13]  Michael G. Strintzis,et al.  An ontology approach to object-based image retrieval , 2003, Proceedings 2003 International Conference on Image Processing (Cat. No.03CH37429).

[14]  Shih-Fu Chang,et al.  Integrating Multiple Classifiers In Visual Object Detectors Learned From User Input , 2000 .

[15]  Deb Roy,et al.  Learning visually grounded words and syntax of natural spoken language , 2000 .

[16]  Aaron F. Bobick,et al.  Classifying Objects from Visual Information , 1986 .

[17]  Michael I. Jordan Learning in Graphical Models , 1999, NATO ASI Series.

[18]  Shih-Fu Chang,et al.  Conceptual framework for indexing visual information at multiple levels , 1999, Electronic Imaging.

[19]  Zoubin Ghahramani,et al.  A Unifying Review of Linear Gaussian Models , 1999, Neural Computation.

[20]  Mounia Lalmas Information retrieval and Dempster-Shafer's theory of evidence , 1998, Applications of Uncertainty Formalisms.

[21]  Trevor Darrell,et al.  Integrated Person Tracking Using Stereo, Color, and Pattern Detection , 2000, International Journal of Computer Vision.

[22]  Ramakant Nevatia,et al.  An Ontology for Video Event Representation , 2004, 2004 Conference on Computer Vision and Pattern Recognition Workshop.

[23]  John K. Tsotsos,et al.  A framework for visual motion understanding , 1980, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[24]  Sven Wachsmuth,et al.  Multilevel Integration of Vision and Speech Understanding Using Bayesian Networks , 1999, ICVS.

[25]  David A. Forsyth,et al.  Matching Words and Pictures , 2003, J. Mach. Learn. Res..

[26]  W. Eric L. Grimson,et al.  Answering Questions about Moving Objects in Surveillance Videos , 2003, New Directions in Question Answering.

[27]  Simon Parsons,et al.  A review of uncertainty handling formalisms , 1998, Applications of Uncertainty Formalisms.

[28]  Bernt Schiele,et al.  Towards Robust Multi-cue Integration for Visual Tracking , 2001, ICVS.

[29]  Mieczyslaw M. Kokar,et al.  Using ontologies and symbolic information in automatic target recognition , 2002, SPIE Defense + Commercial Sensing.

[30]  Paul Smith,et al.  Edge-based motion segmentation , 2002 .

[31]  George A. Miller,et al.  Introduction to WordNet: An On-line Lexical Database , 1990 .

[32]  Thierry Pun,et al.  Performance evaluation in content-based image retrieval: overview and proposals , 2001, Pattern Recognit. Lett..

[33]  Ying Wu,et al.  A co-inference approach to robust visual tracking , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[34]  Thomas S. Huang,et al.  JPDAF based HMM for real-time contour tracking , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[35]  Avi Pfeffer,et al.  SPOOK: A system for probabilistic object-oriented knowledge representation , 1999, UAI.

[36]  Alicia Abella,et al.  From imagery to salience: locative expressions in context , 1995 .

[37]  James L. Crowley,et al.  Perceptual Components for Context Aware Computing , 2002, UbiComp.

[38]  Yorick Wilks,et al.  Extracting relational facts for indexing and retrieval of crime-scene photographs , 2003, Knowl. Based Syst..

[39]  Anthony Hoogs,et al.  Video content annotation using visual analysis and a large semantic knowledgebase , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[40]  Christopher Town Ontology-Driven Bayesian Networks for Dynamic Scene Understanding , 2004, 2004 Conference on Computer Vision and Pattern Recognition Workshop.

[41]  A. Murat Tekalp,et al.  Integrated semantic-syntactic video event modeling for search and retrieval , 2002, Proceedings. International Conference on Image Processing.

[42]  Shaogang Gong,et al.  Tracking Discontinuous Motion Using Bayesian Inference , 2000, ECCV.

[43]  David Sinclair Voronoi seeded colour image segmentation , 1999 .

[44]  Christopher Town,et al.  Ontology based visual information processing , 2004 .

[45]  C. Kohler Selecting ghosts and queues from a car trackers output using a spatio-temporal query language , 2004, CVPR 2004.

[46]  David Sinclair,et al.  Language-based querying of image collections on the basis of an extensible ontology , 2004, Image Vis. Comput..

[47]  Deb K. Roy,et al.  Learning visually grounded words and syntax for a scene description task , 2002, Comput. Speech Lang..

[48]  Ramakant Nevatia,et al.  Hierarchical Language-based Representation of Events in Video Streams , 2003, 2003 Conference on Computer Vision and Pattern Recognition Workshop.

[49]  Ming-Kuei Hu,et al.  Visual pattern recognition by moment invariants , 1962, IRE Trans. Inf. Theory.

[50]  Nicola Guarino,et al.  OntoSeek: content-based access to the Web , 1999, IEEE Intell. Syst..

[51]  D. Roy A TRAINABLE VISUALLY-GROUNDED SPOKEN LANGUAGE GENERATION SYSTEM , 2002 .

[52]  Steve McLaughlin,et al.  Comparative study of textural analysis techniques to characterise tissue from intravascular ultrasound , 1996, Proceedings of 3rd IEEE International Conference on Image Processing.

[53]  Jake K. Aggarwal,et al.  Event semantics in two-person interactions , 2004, Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004..

[54]  Joo-Hwee Lim Learnable visual keywords for image classification , 1999, DL '99.

[55]  Gregory F. Cooper,et al.  A Bayesian Method for the Induction of Probabilistic Networks from Data , 1992 .

[56]  Neil C. Rowe,et al.  Automatic classification of objects in captioned depictive photographs for retrieval , 1997 .

[57]  John R. Kender,et al.  From Pictures to Words: Generating Locative Descriptions of Objects in an Image , 1994 .

[58]  Udo Kruschwitz,et al.  Exploiting structure for intelligent Web search , 2001, Proceedings of the 34th Annual Hawaii International Conference on System Sciences.

[59]  Junaed Sattar Snakes , Shapes and Gradient Vector Flow , 2022 .

[60]  Surya Nepal,et al.  A fuzzy object query language (FOQL) for image databases , 1999, Proceedings. 6th International Conference on Advanced Systems for Advanced Applications.

[61]  David Heckerman,et al.  A Tutorial on Learning with Bayesian Networks , 1998, Learning in Graphical Models.

[62]  Nir Friedman,et al.  Being Bayesian about Network Structure , 2000, UAI.

[63]  Kerry Rodden,et al.  Evaluating similarity-based visualisations as interfaces for image browsing , 2002 .

[64]  David A. Forsyth,et al.  Learning the semantics of words and pictures , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[65]  Stevan Harnad The Symbol Grounding Problem , 1999, ArXiv.

[66]  Thierry Pun,et al.  The Truth about Corel - Evaluation in Image Retrieval , 2002, CIVR.

[67]  Yorick Wilks,et al.  Intelligent Indexing of Crime Scene Photographs , 2003, IEEE Intell. Syst..

[68]  William I. Grosky,et al.  From features to semantics: some preliminary results , 2000, 2000 IEEE International Conference on Multimedia and Expo. ICME2000. Proceedings. Latest Advances in the Fast Changing World of Multimedia (Cat. No.00TH8532).

[69]  David Sinclair,et al.  Ontological query language for content based image retrieval , 2001, Proceedings IEEE Workshop on Content-Based Access of Image and Video Libraries (CBAIVL 2001).

[70]  Kevin Murphy,et al.  Bayes net toolbox for Matlab , 1999 .

[71]  Sven Wachsmuth,et al.  Integration of Vision and Speech Understanding Using Bayesian Networks , 2000 .

[72]  Ramakant Nevatia,et al.  Large-scale event detection using semi-hidden Markov models , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[73]  Deb Roy,et al.  A trainable spoken language understanding system for visual object selection , 2002, INTERSPEECH.

[74]  Alois Knoll,et al.  Fuzzy Quantifiers for Processing Natural-Language Queries in Content-Based Multimedia Retrieval Systems , 1997 .

[75]  Pietro Perona,et al.  Bayesian reasoning on qualitative descriptions from images and speech , 2000, Image Vis. Comput..

[76]  R. Millikan,et al.  Minds, Machines and Evolution , 1986 .

[77]  Mieczyslaw M. Kokar,et al.  An example of using ontologies and symbolic information in automatic target recognition , 2002 .