Content-based video retrieval: does video's semantic visual feature matter?

A new shot level video browsing method based on semantic visual features (e.g., car, mountain, and fire) is proposed to facilitate content-based retrieval. The video's binary semantic feature vector is utilized to calculate the score of similarity between two shot keyframes. The score is then used to browse the "similar" keyframes in terms of semantic visual features. A pilot user study was conducted to better understand users' behaviors in video retrieval context. Three video retrieval and browsing systems are compared: temporal neighbor, semantic visual feature, and fused browsing system. The initial results indicated that the semantic visual feature browsing was effective and efficient for Visual Centric tasks, but not for Non-visual Centric tasks.