Zhejiang University at TRECVID 2006

We participated in the high-level feature extraction and interactive-search task for TRECVID 2006. Interaction and integration of multi-modality media types such as visual, audio and textual data in video are the essence of video content analysis. Although any uni-modality type partially expresses limited semantics less or more, video semantics are fully manifested only by interaction and integration of any unimodal. For the high-level feature extraction and interactive-search tasks, taking the temporal-sequenced associated cooccurence characteristic of multimodal media data in video into consideration, we develop a new approach to represent the relations between seperate shots, which mainly uses SimFusion and Locality Preserving Projections (LPP). SimFusion is an effective algorithm to reinforce or propagate the similarity relations between multi-modalities. LPP is an optimal combination of linear and nonlinear dimensionality reduction method. For high-level feature extraction task, e.g.semantic concept detection which actually is a pattern recognition problem, we use SVM as the powerful classifier to perform detection. For interactive search, we make use of relevance feedback to revise search results. We submitted one fun for each task.