Semi-automatic, data-driven construction of multimedia ontologies

In this paper we investigate semi-automatic construction of multimedia ontologies using a data-driven approach. We start with a collection of videos for which we wish to build an ontology (an explicit specification of a domain). Each video is pre-processed: scene cut detection, automatic speech recognition (ASR), and metadata extraction are performed. In addition we automatically index the videos based on visual content by extracting syntactic (e.g., color, texture, etc.) and semantic features (e.g., face, landscape, etc.). We then combine standard tools for ontology engineering and tools in content-based retrieval to semi-automatically build ontologies. In the first stage we process the text information available with the videos (ASR, metadata, and annotations, if any). Stop words (e.g., a, on, the) are eliminated and statistics (e.g., frequency, TFIDF, and entropy) are computed for all terms. Based on this data we manually select concepts and relationships to include in the ontology. Then we use content-based retrieval tools to assign multimedia entities (e.g., shots, videos, collections of videos) to concepts, properties, or relationships in the ontology, and to select multimedia entities as concepts, relationships, or properties in the ontology. We explore this methodology to construct multimedia ontologies from 24 hours of educational films from the 1940s-1960s used in the TREC video retrieval benchmark and discuss the problems encountered and future directions.

[1]  N. Shiotani,et al.  Image retrieval system using an iconic thesaurus , 1997, 1997 IEEE International Conference on Intelligent Processing Systems (Cat. No.97TH8335).

[2]  Dennis McLeod,et al.  Audio structuring and personalized retrieval using ontologies , 2000, Proceedings IEEE Advances in Digital Libraries 2000.

[3]  Steffen Staab,et al.  SEAL - Tying Up Information Integration and Web Site Management by Ontologies , 2002, IEEE Data Eng. Bull..

[4]  Bob J. Wielinga,et al.  Ontology-Based Photo Annotation , 2001, IEEE Intell. Syst..

[5]  John B. Shoven,et al.  I , Edinburgh Medical and Surgical Journal.

[6]  Haim H. Permuter,et al.  IBM Research TREC 2002 Video Retrieval System , 2002, TREC.

[7]  Shih-Fu Chang,et al.  MediaNet: a multimedia information network for knowledge representation , 2000, SPIE Optics East.

[8]  Nicola Guarino,et al.  Formal Ontology and Information Systems , 1998 .

[9]  N. Guarino,et al.  Formal Ontology in Information Systems : Proceedings of the First International Conference(FOIS'98), June 6-8, Trento, Italy , 1998 .

[10]  Shih-Fu Chang,et al.  Semantic knowledge construction from annotated image collections , 2002, Proceedings. IEEE International Conference on Multimedia and Expo.

[11]  HongJiang Zhang,et al.  Thesaurus-aided approach for image browsing and retrieval , 2001, IEEE International Conference on Multimedia and Expo, 2001. ICME 2001..

[12]  V. R. Benjamins,et al.  WonderTools? A comparative study of ontological engineering tools , 2000, Int. J. Hum. Comput. Stud..

[13]  John R. Smith,et al.  Context-enhanced video understanding , 2003, IS&T/SPIE Electronic Imaging.