Model recommendation for action recognition

Simply choosing one model out of a large set of possibilities for a given vision task is a surprisingly difficult problem, especially if there is limited evaluation data with which to distinguish among models, such as when choosing the best “walk” action classifier from a large pool of classifiers tuned for different viewing angles, lighting conditions, and background clutter. In this paper we suggest that this problem of selecting a good model can be recast as a recommendation problem, where the goal is to recommend a good model for a particular task based on how well a limited probe set of models appears to perform. Through this conceptual remapping, we can bring to bear all the collaborative filtering techniques developed for consumer recommender systems (e.g., Netflix, Amazon.com). We test this hypothesis on action recognition, and find that even when every model has been directly rated on a training set, recommendation finds better selections for the corresponding test set than the best performers on the training set.

[1]  Yehuda Koren,et al.  Factor in the neighbors: Scalable and accurate collaborative filtering , 2010, TKDD.

[2]  Ingo Mierswa,et al.  Efficient Case Based Feature Construction for Heterogeneous Learning Tasks , 2006 .

[3]  Michael I. Jordan,et al.  Multi-task feature selection , 2006 .

[4]  Yehuda Koren,et al.  The BellKor Solution to the Netflix Grand Prize , 2009 .

[5]  Guillermo Sapiro,et al.  Online Learning for Matrix Factorization and Sparse Coding , 2009, J. Mach. Learn. Res..

[6]  Cordelia Schmid,et al.  A Spatio-Temporal Descriptor Based on 3D-Gradients , 2008, BMVC.

[7]  Stefan Kramer,et al.  Kernel-Based Inductive Transfer , 2008, ECML/PKDD.

[8]  Rich Caruana,et al.  Multitask Learning , 1998, Encyclopedia of Machine Learning and Data Mining.

[9]  John Blitzer,et al.  Domain Adaptation with Structural Correspondence Learning , 2006, EMNLP.

[10]  Cordelia Schmid,et al.  Learning realistic human actions from movies , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[11]  Antonin Chambolle,et al.  A First-Order Primal-Dual Algorithm for Convex Problems with Applications to Imaging , 2011, Journal of Mathematical Imaging and Vision.

[12]  Andrew W. Fitzgibbon,et al.  Real-time human pose recognition in parts from single depth images , 2011, CVPR 2011.

[13]  Massimiliano Pontil,et al.  Multi-Task Feature Learning , 2006, NIPS.

[14]  Charles A. Micchelli,et al.  Learning Multiple Tasks with Kernel Methods , 2005, J. Mach. Learn. Res..

[15]  Daphne Koller,et al.  Learning a meta-level prior for feature relevance from multiple related tasks , 2007, ICML '07.

[16]  Rama Chellappa,et al.  Domain adaptation for object recognition: An unsupervised approach , 2011, 2011 International Conference on Computer Vision.

[17]  James Bennett,et al.  The Netflix Prize , 2007 .

[18]  Yihong Gong,et al.  Training Hierarchical Feed-Forward Visual Recognition Models Using Transfer Learning from Pseudo-Tasks , 2008, ECCV.

[19]  Richard A. Harshman,et al.  Indexing by Latent Semantic Analysis , 1990, J. Am. Soc. Inf. Sci..

[20]  Martial Hebert,et al.  Feature seeding for action recognition , 2011, 2011 International Conference on Computer Vision.

[21]  Trevor Darrell,et al.  Adapting Visual Category Models to New Domains , 2010, ECCV.

[22]  Ingo Mierswa,et al.  Efficient Feature Construction by Meta Learning – Guiding the Search in Meta Hypothesis Space , 2005 .