A probablistic framework for mapping audio-visual features to high-level semantics in terms of concepts and context
暂无分享,去创建一个
Multimedia content is an essential part of information technology. However, the difficulty in filtering, searching, and summarizing video has so far hindered the effective utilization of video databases. Users want to filter and query video by high-level (semantic) concepts, while automatic algorithms can extract only low-level features (e.g., color, texture, shape, amount of motion). Bridging this gap is thus the most challenging problem in video (multimedia) indexing, retrieval, and filtering.
In this thesis we develop a probabilistic framework for mapping low-level features to high-level semantics. This framework consists of representation of concepts using probabilistic multimedia objects (multijects) and representation of contextual constraints of such objects called the multinet. To extract semantics from video, we approach the problem of video understanding as a pattern recognition problem. We develop a framework for modeling the concepts and the context of video content.
Innovative claims of this research include the following: (1) Efficient and generic probabilistic methods for generating a multimedia lexicon of multijects representing semantic concepts consisting of objects, sites, and events using audio and visual features. This is achieved by using probabilistic pattern recognition techniques for fusing multiple modalities. (2) Probabilistic graphical methods for discovering relationships between different semantic concepts and using these relationships to enforce contextual constraints. This leads to enhanced detection of existing concepts and facilitates detection of concepts not observable directly
To alleviate the burden of labeling large data sets for the supervised training of the multiject models, we also present a technique to train models using a partially labeled data set. To support query by audiovisual content, we also implement an algorithm based on the principle of dynamic programming. We also incorporate relevance feedback for this paradigm of query.
The framework is flexible and generic and can therefore be applied easily to applications such as semantic video indexing and filtering, human-computer intelligent interaction, surveillance, and Internet multimedia management.