Videos Semantic Indexing using Image Classification

This notebook paper summarizes Team NEC-UIUC’s approaches for TRECVid 2010 Evaluation of Semantic Indexing. Our submissions mainly take advantage of advanced image classification methods using linear coordinate coding (LCC) of local features powered by the distributed computing software Hadoop. For every video shot, we evenly sample key frames and extract dense local features including DHOG and LBP, which are encoded by linear coordinate coding. Then, for every concept large-scale linear SVM classifiers are trained based on spatial pyramid of LCC features. Finally, we employ multiple instance learning to rank the video shots according to the SVM scores of individual frames. Our systems achieve mean extended inferred average precision (mean xinfAP) 7.40% for the 30 concepts evaluated by NIST and mean average precision 28.63% using 1/5 of the development data as the validation set for the total 130 concepts.