论文信息 - Learning to Adapt Across Multimedia Domains

Learning to Adapt Across Multimedia Domains

In multimedia, machine learning techniques are often applied to build models to map low-level feature vectors into semantic labels. As data such as images and videos come from a variety of domains (e.g., genres, sources) with different distributions, there is a benefit of adapting models trained from one domain to other domains in terms of improving performance and reducing computational and human cost. In this thesis, we focus on a generic adaptation setting in multimedia, where supervised classifiers trained from one or more auxiliary domains are adapted to a new classifier that works well on a target domain with limited labeled examples. Our main contribution is a discriminative framework for function-level classifier adaptation based on regularized loss minimization, which adapts classifiers of any type by modifying their decision functions in an efficient and principled way. Two adaptation algorithms derived from this general framework, adaptive support vector machines (aSVM) and adaptive kernel logistic regression (aKLR), are discussed in details. We further extend this framework by integrating domain analysis approaches that measure and weight the utility of auxiliary domains, and sample selection methods that identify informative examples to help the adaptation process. The proposed approaches are evaluated on cross-domain video concept detection using the TRECVID corpus, where preliminary experiments have shown promising results. Our general approaches can be applied to other adaptation problems including retrieval model adaptation and cross-corpus text categorization. Thesis Committee: Alexander G. Hauptmann (Chair) Christos Faloutsos Jie Yang Shih-Fu Chang (Columbia University)

Jun Yang

[1] H. Sebastian Seung,et al. Query by committee , 1992, COLT '92.

[2] Transfer Learning of Object Classes : From Cartoons to Photographs , 1992 .

[3] David A. Cohn,et al. Active Learning with Statistical Models , 1996, NIPS.

[4] Chin-Hui Lee,et al. Maximum a posteriori estimation for multivariate Gaussian mixture observations of Markov chains , 1994, IEEE Trans. Speech Audio Process..

[5] P. Woodland,et al. Flexible speaker adaptation using maximum likelihood linear regression , 1995 .

[6] Sebastian Thrun,et al. Is Learning The n-th Thing Any Easier Than Learning The First? , 1995, NIPS.

[7] Mark J. F. Gales,et al. Robust continuous speech recognition using parallel model combination , 1996, IEEE Trans. Speech Audio Process..

[8] Ingemar J. Cox,et al. PicHunter: Bayesian relevance feedback for image retrieval , 1996, Proceedings of 13th International Conference on Pattern Recognition.

[9] Alexander J. Smola,et al. Support Vector Regression Machines , 1996, NIPS.

[10] Takeo Kanade,et al. Name-It: association of face and name in video , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[11] Martin Szummer,et al. Indoor-outdoor image classification , 1998, Proceedings 1998 IEEE International Workshop on Content-Based Access of Image and Video Database.

[12] Andrew McCallum,et al. Employing EM and Pool-Based Active Learning for Text Classification , 1998, ICML.

[13] Brendan J. Frey,et al. Probabilistic multimedia objects (multijects): a novel approach to video indexing and retrieval in multimedia systems , 1998, Proceedings 1998 International Conference on Image Processing. ICIP98 (Cat. No.98CB36269).

[14] Thomas S. Huang,et al. Relevance feedback: a power tool for interactive content-based image retrieval , 1998, IEEE Trans. Circuits Syst. Video Technol..

[15] Rich Caruana,et al. Multitask Learning , 1998, Encyclopedia of Machine Learning and Data Mining.

[16] Thorsten Joachims,et al. Making large-scale support vector machine learning practical , 1999 .

[17] Philip C. Woodland,et al. An investigation into vocal tract length normalisation , 1999, EUROSPEECH.

[18] Rebecca Hwa. Supervised Grammar Induction using Training Data with Limited Constituent Information , 1999, ACL.

[19] John C. Platt,et al. Fast training of support vector machines using sequential minimal optimization, advances in kernel methods , 1999 .

[20] Nello Cristianini,et al. Query Learning with Large Margin Classi ersColin , 2000 .

[21] Daphne Koller,et al. Active Learning for Parameter Estimation in Bayesian Networks , 2000, NIPS.

[22] H. Shimodaira,et al. Improving predictive inference under covariate shift by weighting the log-likelihood function , 2000 .

[23] Thorsten Joachims,et al. Detecting Concept Drift with Support Vector Machines , 2000, ICML.

[24] Greg Schohn,et al. Less is More: Active Learning with Support Vector Machines , 2000, ICML.

[25] Gert Cauwenberghs,et al. Incremental and Decremental Support Vector Machine Learning , 2000, NIPS.

[26] Anil K. Jain,et al. Image classification for content-based indexing , 2001, IEEE Trans. Image Process..

[27] R. Manmatha,et al. Modeling score distributions for combining the outputs of search engines , 2001, SIGIR '01.

[28] Stefan Rüping,et al. Incremental Learning with Support Vector Machines , 2001, ICDM.

[29] William Nick Street,et al. A streaming ensemble algorithm (SEA) for large-scale classification , 2001, KDD '01.

[30] Daphne Koller,et al. Support Vector Machine Active Learning with Applications to Text Classification , 2000, J. Mach. Learn. Res..

[31] Daniel Gildea,et al. Corpus Variation and Parser Performance , 2001, EMNLP.

[32] Robert E. Schapire,et al. Incorporating Prior Knowledge into Boosting , 2002, ICML.

[33] R. Manmatha,et al. Automatic image annotation and retrieval using cross-media relevance models , 2003, SIGIR.

[34] Shai Ben-David,et al. Exploiting Task Relatedness for Mulitple Task Learning , 2003, COLT.

[35] Paul Over,et al. TRECVID: Benchmarking the Effectivenss of Information Retrieval Tasks on Digital Video , 2003, CIVR.

[36] Michael I. Jordan,et al. Modeling annotated data , 2003, SIGIR.

[37] Tobun Dorbin Ng,et al. Informedia at TRECVID 2003 : Analyzing and Searching Broadcast News Video , 2003, TRECVID.

[38] David A. Forsyth,et al. Matching Words and Pictures , 2003, J. Mach. Learn. Res..

[39] Brian Roark,et al. Unsupervised language model adaptation , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[40] Brian Roark,et al. Supervised and unsupervised PCFG adaptation to novel domains , 2003, NAACL.

[41] Mads Haahr,et al. A Case-Based Approach to Spam Filtering that Can Track Concept Drift , 2003 .

[42] Jian Su,et al. Effective Adaptation of Hidden Markov Model-based Named Entity Recognizer for Biomedical Domain , 2003, BioNLP@ACL.

[43] Tom Heskes,et al. Task Clustering and Gating for Bayesian Multitask Learning , 2003, J. Mach. Learn. Res..

[44] Philip S. Yu,et al. Mining concept-drifting data streams using ensemble classifiers , 2003, KDD '03.

[45] Marcus A. Maloof,et al. Dynamic weighted majority: a new ensemble method for tracking concept drift , 2003, Third IEEE International Conference on Data Mining.

[46] Pietro Perona,et al. A Bayesian approach to unsupervised one-shot learning of object categories , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[47] Rong Yan,et al. Automatically labeling video data using multi-class active learning , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[48] John R. Smith,et al. Semantic representation: search and mining of multimedia content , 2004, KDD '04.

[49] Rong Yan,et al. Learning query-class dependent weights in automatic video retrieval , 2004, MULTIMEDIA '04.

[50] Thomas G. Dietterich,et al. Improving SVM accuracy by training on auxiliary data sources , 2004, ICML.

[51] Yi Wu,et al. Ontology-based multi-classification learning for video concept detection , 2004, 2004 IEEE International Conference on Multimedia and Expo (ICME) (IEEE Cat. No.04TH8763).

[52] Bianca Zadrozny,et al. Learning and evaluating classifiers under sample selection bias , 2004, ICML.

[53] Yi Zhang. Using bayesian priors to combine classifiers for adaptive filtering , 2004, SIGIR '04.

[54] Wei Fan,et al. Systematic data selection to mine concept-drifting data streams , 2004, KDD.

[55] Neil D. Lawrence,et al. Learning to learn with the informative vector machine , 2004, ICML.

[56] Alex Acero,et al. Adaptation of Maximum Entropy Capitalizer: Little Data Can Help a Lo , 2006, Comput. Speech Lang..

[57] Edward Y. Chang,et al. Optimal multimodal fusion for multimedia data analysis , 2004, MULTIMEDIA '04.

[58] R. Manmatha,et al. Multiple Bernoulli relevance models for image and video annotation , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[59] Massimiliano Pontil,et al. Regularized multi--task learning , 2004, KDD.

[60] Jun Yang,et al. Naming every individual in news video monologues , 2004, MULTIMEDIA '04.

[61] Marcel Worring,et al. The MediaMill TRECVID 2004 Semantic Viedo Search Engine , 2004, TRECVID.

[62] Lawrence Carin,et al. Logistic regression with an auxiliary data source , 2005, ICML.

[63] Charles A. Micchelli,et al. Learning Multiple Tasks with Kernel Methods , 2005, J. Mach. Learn. Res..

[64] Yiming Yang,et al. Learning Multiple Related Tasks using Latent Independent Component Analysis , 2005, NIPS.

[65] Anton Schwaighofer,et al. Learning Gaussian processes from multiple tasks , 2005, ICML.

[66] Milind R. Naphade,et al. Learning the semantics of multimedia queries and concepts from a small number of examples , 2005, MULTIMEDIA '05.

[67] Alexandru Niculescu-Mizil. Learning the Structure of Related Tasks , 2005 .

[68] Thomas G. Dietterich,et al. Transfer Learning with an Ensemble of Background Tasks , 2005, NIPS 2005.

[69] Tong Zhang,et al. A Framework for Learning Predictive Structures from Multiple Tasks and Unlabeled Data , 2005, J. Mach. Learn. Res..

[70] Rong Yan,et al. Mining Associated Text and Images with Dual-Wing Harmoniums , 2005, UAI.