CRIM AT TRECVID-2011: Content-Based Copy Detection using Nearest-Neighbor Mapping

We report results on content-based audio and video copy detection for TRECVID 2011 CBCD evaluation using nearest-neighbor mapping. The nearest-neighbor mapping was used successfully in audio copy detection for TRECVID 2009 with excellent results (min NDCR of 0.06 averaged over all seven transforms for actual no FA case). For this reason, we decided to implement nearest-neighbor mapping for video copy detection also. For video copy detection using nearest-neighbor mapping, the idea is to first map each video frame of the test to the closest query video frame. We then move the query over the test to find the test segment with the highest number of matching frames. This nearest-neighbor mapping lead to good matching scores even when the query video was distorted and contained occlusions. We test these algorithms on TRECVID 2009 and 2010 content-based copy detection evaluation data. For both these tasks, the nearest-neighbor video copy detection gives minimal normalized detection cost rate (min NDCR) comparable to that achieved with audio copy detection for the same task. We augment audio copy detection by using three different feature parameters: MFCC, equalized MFCC, and Gaussianized MFCC. Pooling the results from the three feature parameters gives the lowest miss rate, and when combined with video copy detection, we get significantly improved audio+video copy detection results. For TRECVID-2011 CBCD evaluation, we get the lowest min NDCR for 25 out of 56 transforms for the actual no FA case. All our runs (V48A66T58B, V48A66T65B, V48A66T160, V48A66T60) were the same except for the thresholds.