Video classification based on low-level feature fusion model

This article presents a new system for automatically extracting high-level video concepts. The novelty of the approach lies in the feature fusion method. The system architecture is divided into three steps. The first step consists in creating sensors from a low-level (color or texture) descriptor, and a Support Vector Machine (SVM) learning to recognize a given concept (for example, “beach” or “road”). The sensor fusion step is the combination of several sensors for each concept. Finally, as the concepts depend on context, the concept fusion step models interaction between concepts in order to modify their prediction. The fusion method is based on the Transferable Belief Model (TBM). It offers an appropriate framework for modeling source uncertainty and interaction between concepts. Results obtained on TREC video protocol demonstrate the improvement provided by such a combination, compared to mono-source information.