Multi-Lingual Broadcast News Retrieval

In this notebook paper we describe the technical details of the submissions to TRECVID 2006 from CMU Informedia team. We participated in the high-level feature extraction and the search (automatic and interactive) tasks. Our emphasis is on various techniques used for the search task, where our interactive runs won the first place in the interactive track and our automatic runs are also among the top performers in the automatic track. 1 High-level feature extraction We submitted 6 runs for TRECVID 2006 high level feature evaluation, as shown in Table 1. There were 61901 labeled shots for the 39 concepts of the LSCOM-Lite set. We split those labeled shots into a training set (45963 shots) and a fusion set (15938 shots). We use the training set to train our baseline classifiers based on various combinations of low-level features. Support vector machines (SVM) with radial basis kernel function (RBF) are used in the training of baseline classifiers. Based on our experience, the parameter setting of SVM is critical to the performance. Therefore, we perform linear search of the parameter space using cross-validation to find the optimal parameters for each concept in the training set, particularly the gamma parameter in the kernel function and the cost parameter. In Run 1, using the optimal parameter setting achieves an average of 27% improvement (0.2633 to 0.3352) over the default setting in terms of the mean average precision (MAP) metric on the 39 concepts in the cross-validation experiment.