Columbia University TRECVID 2007 High-Level Feature Extraction

One difficulty in the HLF task this year was changing the applied domain from news video to foreign documentary videos. Classifiers trained in prior years performed poorly if naively applied, and classifiers trained on the 2007 data alone may suffer from too few positive training samples. This year we address this new fundamental problem how to efficiently and effectively adapt models learned from an old domain to a significantly different one. Investigation of this topic complements very well the scalability issue discussed in TRECVID 2006 how to leverage the resource of a large concept detector pool (e.g., Columbia 374) to improve accuracy of individual detectors. We developed and tested a new cross-domain SVM (CDSVM) algorithm for adapting previously learned support vectors from one domain to help classification in another domain. Performance gain is obtained with almost no additional computational cost. Also, we conduct a comprehensive comparative study of the state-of-the-art SVM-based cross-domain learning methods. To further understand the underlying contributing factors, we propose an intuitive selection criterion to determine which cross-domain learning method to use for each concept. Such a prediction mechanism is important since there are a multitude of promising methods for adapting old models to new domains, and thus judicious selection is a key to applying the right method under the right context (e.g., size of training data in new/old domains, variation of content between two domains, etc). Although there is no single method that universally outperforms other options, with adequate prediction mechanisms, we will be able to apply the right adaptation approach in different conditions, and demonstrate 22% performance improvement for mid-frequency or rare concepts.

[1]  Paul Over,et al.  Evaluation campaigns and TRECVid , 2006, MIR '06.

[2]  Alexander Gammerman,et al.  Learning by Transduction , 1998, UAI.

[3]  Rich Caruana,et al.  Ensemble selection from libraries of models , 2004, ICML.

[4]  Shih-Fu Chang,et al.  Discovery and fusion of salient multimodal features toward news story segmentation , 2003, IS&T/SPIE Electronic Imaging.

[5]  Rong Jin,et al.  Localized Support Vector Machine and Its Efficient Algorithm , 2007, SDM.

[6]  Guoping Wang,et al.  Learning with progressive transductive Support Vector Machine , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..

[7]  Shih-Fu Chang,et al.  Columbia University’s Baseline Detectors for 374 LSCOM Semantic Visual Concepts , 2007 .

[8]  Shih-Fu Chang,et al.  Cross-domain learning methods for high-level visual concept classification , 2008, 2008 15th IEEE International Conference on Image Processing.

[9]  Stéphane Ayache,et al.  Evaluation of active learning strategies for video indexing , 2007, Signal Process. Image Commun..

[10]  Zhu Liu,et al.  Multimedia content acquisition and processing in the MIRACLE system , 2006, CCNC 2006. 2006 3rd IEEE Consumer Communications and Networking Conference, 2006..

[11]  Rong Yan,et al.  Cross-domain video concept detection using adaptive svms , 2007, ACM Multimedia.

[12]  Dong Xu,et al.  Columbia University TRECVID-2006 Video Search and High-Level Feature Extraction , 2006, TRECVID.

[13]  Hal Daumé,et al.  Frustratingly Easy Domain Adaptation , 2007, ACL.

[14]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[15]  Shih-Fu Chang,et al.  Context-Based Concept Fusion with Boosted Conditional Random Fields , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.