Model recommendation for action recognition and other applications

The typical approach to learning based vision has been that for each individual application, classifiers or detectors are learned anew from annotated training data for each specific task. However, the classifiers trained in this way tend to be brittle and highly specialized to the datasets from which they are derived, making them difficult to transfer between tasks. While multi-task learning and domain adaption techniques address some of these problems on a theoretical level, from a practical standpoint they are just as complicated and labor-intensive as the simpler learning techniques they supplant. However, suppose that these specialized classifiers had simply been collected into a library: while it is unlikely that any specific classifier would generalize well to a new dataset, there may exist some classifier in the library that is tuned to the same conditions as the new task. This thesis addresses the fundamental question of how to efficiently select a good classifier from such a library. Specifically, this thesis demonstrates that collaborative filtering techniques (such as employed by recommender systems like Netflix and Amazon.com) can be used to recommend models appropriate for a specific target task. These recommendations are made by trying, or rating, a small subset of models on the target task, and then using that small set of ratings along with the ratings of the models on other tasks to predict the ratings of the unevaluated models on the target task. This process, which we term "model recommendation", is applied to action recognition and other vision and robotics applications, and the subtle differences between model recommendation and typical recommender systems are used to derive novel algorithms and extensions to the core recommendation concept.

[1]  Jean Ponce,et al.  Learning mid-level features for recognition , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[2]  Martial Hebert,et al.  Representing Pairwise Spatial and Temporal Relations for Action Recognition , 2010, ECCV.

[3]  Marc Pollefeys,et al.  Learning a Confidence Measure for Optical Flow , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[4]  Yoav Freund,et al.  Experiments with a New Boosting Algorithm , 1996, ICML.

[5]  Ingo Mierswa,et al.  Efficient Case Based Feature Construction for Heterogeneous Learning Tasks , 2006 .

[6]  Qiang Yang,et al.  A Survey on Transfer Learning , 2010, IEEE Transactions on Knowledge and Data Engineering.

[7]  Ying Wu,et al.  Discriminative subvolume search for efficient action detection , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[8]  Michael I. Jordan,et al.  Multi-task feature selection , 2006 .

[9]  Xi Chen,et al.  Temporal Collaborative Filtering with Bayesian Probabilistic Tensor Factorization , 2010, SDM.

[10]  Jason Weston,et al.  Large Scale Transductive SVMs , 2006, J. Mach. Learn. Res..

[11]  Nuria Oliver,et al.  Multiverse recommendation: n-dimensional tensor factorization for context-aware collaborative filtering , 2010, RecSys '10.

[12]  Huan Liu,et al.  A selective sampling approach to active feature selection , 2004, Artif. Intell..

[13]  Rama Chellappa,et al.  Domain adaptation for object recognition: An unsupervised approach , 2011, 2011 International Conference on Computer Vision.

[14]  Serge J. Belongie,et al.  Behavior recognition via sparse spatio-temporal features , 2005, 2005 IEEE International Workshop on Visual Surveillance and Performance Evaluation of Tracking and Surveillance.

[15]  Jason J. Corso,et al.  Action bank: A high-level representation of activity in video , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[16]  Yehuda Koren,et al.  Scalable Collaborative Filtering with Jointly Derived Neighborhood Interpolation Weights , 2007, Seventh IEEE International Conference on Data Mining (ICDM 2007).

[17]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[18]  Padraig Cunningham,et al.  Overfitting in Wrapper-Based Feature Subset Selection: The Harder You Try the Worse it Gets , 2004, SGAI Conf..

[19]  Hui Li,et al.  Multi-task Reinforcement Learning in Partially Observable Stochastic Environments , 2009, J. Mach. Learn. Res..

[20]  Martial Hebert,et al.  Data-Driven Scene Understanding from 3D Models , 2012, BMVC.

[21]  Yehuda Koren,et al.  Factor in the neighbors: Scalable and accurate collaborative filtering , 2010, TKDD.

[22]  Nassir Navab,et al.  Rapid selection of reliable templates for visual tracking , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[23]  John Blitzer,et al.  Domain Adaptation with Structural Correspondence Learning , 2006, EMNLP.

[24]  Xiaogang Wang,et al.  Boosted multi-task learning for face verification with applications to web image and video search , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[25]  Sylvia C. Wong,et al.  A topological coverage algorithm for mobile robots , 2003, Proceedings 2003 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2003) (Cat. No.03CH37453).

[26]  Stefan Kramer,et al.  Kernel-Based Inductive Transfer , 2008, ECML/PKDD.

[27]  Gang Chen,et al.  Collaborative Filtering Using Orthogonal Nonnegative Matrix Tri-factorization , 2007, Seventh IEEE International Conference on Data Mining Workshops (ICDMW 2007).

[28]  Massimiliano Pontil,et al.  Multi-Task Feature Learning , 2006, NIPS.

[29]  Andrea Montanari,et al.  Low-rank matrix completion with noisy observations: A quantitative comparison , 2009, 2009 47th Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[30]  Andrea Montanari,et al.  Matrix completion from a few entries , 2009, 2009 IEEE International Symposium on Information Theory.

[31]  Tommi S. Jaakkola,et al.  Weighted Low-Rank Approximations , 2003, ICML.

[32]  Barbara Caputo,et al.  Recognizing human actions: a local SVM approach , 2004, Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004..

[33]  Nello Cristianini,et al.  An Introduction to Support Vector Machines and Other Kernel-based Learning Methods , 2000 .

[34]  Nathan Srebro,et al.  Fast maximum margin matrix factorization for collaborative prediction , 2005, ICML.

[35]  Masayuki Yamamura,et al.  Multitask reinforcement learning on the distribution of MDPs , 2003, Proceedings 2003 IEEE International Symposium on Computational Intelligence in Robotics and Automation. Computational Intelligence in Robotics and Automation for the New Millennium (Cat. No.03EX694).

[36]  Charles A. Micchelli,et al.  Learning Multiple Tasks with Kernel Methods , 2005, J. Mach. Learn. Res..

[37]  Linlin Shen,et al.  AdaBoost Gabor Feature Selection for Classification , 2004 .

[38]  Stuart J. Russell,et al.  Reinforcement Learning with Hierarchies of Machines , 1997, NIPS.

[39]  V. Kshirsagar,et al.  Face recognition using Eigenfaces , 2011, 2011 3rd International Conference on Computer Research and Development.

[40]  Gustavo Carneiro The automatic design of feature spaces for local image descriptors using an ensemble of non-linear feature extractors , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[41]  Cordelia Schmid,et al.  A Spatio-Temporal Descriptor Based on 3D-Gradients , 2008, BMVC.

[42]  Nicolas Pinto,et al.  How far can you get with a modern face recognition test set using only simple features? , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[43]  Yehuda Koren,et al.  Factorization meets the neighborhood: a multifaceted collaborative filtering model , 2008, KDD.

[44]  Barnabás Póczos,et al.  Collaborative Filtering via Group-Structured Dictionary Learning , 2012, LVA/ICA.

[45]  Ivan Laptev,et al.  On Space-Time Interest Points , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[46]  Changhu Wang,et al.  Probabilistic models for supervised dictionary learning , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[47]  H. Robbins Some aspects of the sequential design of experiments , 1952 .

[48]  Geoffrey E. Hinton,et al.  Reinforcement Learning with Factored States and Actions , 2004, J. Mach. Learn. Res..

[49]  J. Urgen Schmidhuber,et al.  Adaptive confidence and adaptive curiosity , 1991, Forschungsberichte, TU Munich.

[50]  Benjamin Recht,et al.  Random Features for Large-Scale Kernel Machines , 2007, NIPS.

[51]  Juan Carlos Niebles,et al.  Unsupervised Learning of Human Action Categories Using Spatial-Temporal Words , 2006, BMVC.

[52]  Andrew W. Fitzgibbon,et al.  Real-time human pose recognition in parts from single depth images , 2011, CVPR 2011.

[53]  Ming Liu,et al.  HMM-Based Acoustic Event Detection with AdaBoost Feature Selection , 2007, CLEAR.

[54]  T. L. Lai Andherbertrobbins Asymptotically Efficient Adaptive Allocation Rules , 2022 .

[55]  Richard A. Harshman,et al.  Indexing by Latent Semantic Analysis , 1990, J. Am. Soc. Inf. Sci..

[56]  Zicheng Liu,et al.  Cross-dataset action detection , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[57]  Kenneth Y. Goldberg,et al.  Eigentaste: A Constant Time Collaborative Filtering Algorithm , 2001, Information Retrieval.

[58]  Guillermo Sapiro,et al.  Online Learning for Matrix Factorization and Sparse Coding , 2009, J. Mach. Learn. Res..

[59]  Liana G. Apostolova,et al.  Comparison of AdaBoost and Support Vector Machines for Detecting Alzheimer's Disease Through Automated Hippocampal Segmentation , 2010, IEEE Transactions on Medical Imaging.

[60]  Martial Hebert,et al.  Trajectons: Action recognition through the motion analysis of tracked features , 2009, 2009 IEEE 12th International Conference on Computer Vision Workshops, ICCV Workshops.

[61]  Jiebo Luo,et al.  Recognizing realistic actions from videos “in the wild” , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[62]  Daphne Koller,et al.  Learning a meta-level prior for feature relevance from multiple related tasks , 2007, ICML '07.

[63]  Cynthia Rudin,et al.  The Dynamics of AdaBoost: Cyclic Behavior and Convergence of Margins , 2004, J. Mach. Learn. Res..

[64]  Christopher Joseph Pal,et al.  Activity recognition using the velocity histories of tracked keypoints , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[65]  Alan Fern,et al.  Multi-task reinforcement learning: a hierarchical Bayesian approach , 2007, ICML '07.

[66]  Terry Windeatt,et al.  Feature Ranking Ensembles for Facial Action Unit Classification , 2008, ANNPR.

[67]  Subhransu Maji,et al.  Object detection using a max-margin Hough transform , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[68]  Krista A. Ehinger,et al.  SUN database: Large-scale scene recognition from abbey to zoo , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[69]  Kristen Grauman,et al.  Learning with Whom to Share in Multi-task Feature Learning , 2011, ICML.

[70]  Cordelia Schmid,et al.  Learning realistic human actions from movies , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[71]  Hal Daumé,et al.  Frustratingly Easy Domain Adaptation , 2007, ACL.

[72]  Raghu Machiraju,et al.  Human Activity Recognition for Synthesis , 2006 .

[73]  Viatcheslav B. Melas,et al.  Functional Approach to Optimal Experimental Design (Lecture Notes in Statistics) , 2005 .

[74]  Trevor Darrell,et al.  Adapting Visual Category Models to New Domains , 2010, ECCV.

[75]  Alexei A. Efros,et al.  Unbiased look at dataset bias , 2011, CVPR 2011.

[76]  Antonio Torralba,et al.  Semantic Label Sharing for Learning with Many Categories , 2010, ECCV.

[77]  David Elliott,et al.  In the Wild , 2010 .

[78]  Dieter Fox,et al.  Object Recognition in 3D Point Clouds Using Web Data and Domain Adaptation , 2010, Int. J. Robotics Res..

[79]  K. Chaloner,et al.  Bayesian Experimental Design: A Review , 1995 .

[80]  Peter Auer,et al.  Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.

[81]  Benjamin Rosman,et al.  A Multitask Representation Using Reusable Local Policy Templates , 2012, AAAI Spring Symposium: Designing Intelligent Robots.

[82]  Ingo Mierswa,et al.  Efficient Feature Construction by Meta Learning – Guiding the Search in Meta Hypothesis Space , 2005 .

[83]  Rong Jin,et al.  Discriminative Cluster Refinement: Improving Object Category Recognition Given Limited Training Data , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[84]  Cheng Li,et al.  Pixel-Level Hand Detection in Ego-centric Videos , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[85]  C. V. Jawahar,et al.  Has My Algorithm Succeeded? An Evaluator for Human Pose Estimators , 2012, ECCV.

[86]  Martial Hebert,et al.  Feature seeding for action recognition , 2011, 2011 International Conference on Computer Vision.

[87]  Marc Pollefeys,et al.  Segmenting video into classes of algorithm-suitability , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[88]  Koby Crammer,et al.  Analysis of Representations for Domain Adaptation , 2006, NIPS.

[89]  Rich Caruana,et al.  Multitask Learning , 1998, Encyclopedia of Machine Learning and Data Mining.

[90]  Howie Choset,et al.  Coverage Path Planning: The Boustrophedon Cellular Decomposition , 1998 .

[91]  Emmanuel J. Candès,et al.  Exact Matrix Completion via Convex Optimization , 2008, Found. Comput. Math..

[92]  Hong Wei,et al.  Face Verification Using GaborWavelets and AdaBoost , 2006, 18th International Conference on Pattern Recognition (ICPR'06).

[93]  Yehuda Koren,et al.  The BellKor Solution to the Netflix Grand Prize , 2009 .

[94]  Ruslan Salakhutdinov,et al.  Bayesian probabilistic matrix factorization using Markov chain Monte Carlo , 2008, ICML '08.

[95]  William Brendel,et al.  Activities as Time Series of Human Postures , 2010, ECCV.

[96]  P. W. Jones,et al.  Bandit Problems, Sequential Allocation of Experiments , 1987 .

[97]  Ya Zhang,et al.  Multi-task learning for boosting with application to web search ranking , 2010, KDD.

[98]  Sebastian Thrun,et al.  Unsupervised learning of invariant features using video , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[99]  Boris Chidlovskii,et al.  Boosting Multi-Task Weak Learners with Applications to Textual and Social Data , 2010, 2010 Ninth International Conference on Machine Learning and Applications.

[100]  M. Wu,et al.  Collaborative Filtering via Ensembles of Matrix Factorizations , 2007, KDD 2007.

[101]  Ivor W. Tsang,et al.  Visual Event Recognition in Videos by Learning from Web Data , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[102]  Frédéric Jurie,et al.  Motion Models that Only Work Sometimes , 2012, BMVC.

[103]  Qiang Yang,et al.  Boosting for transfer learning , 2007, ICML '07.

[104]  Nguyen Duy Phuong,et al.  Collaborative filtering by multi-task learning , 2008, 2008 IEEE International Conference on Research, Innovation and Vision for the Future in Computing and Communication Technologies.

[105]  Guillermo Sapiro,et al.  Supervised Dictionary Learning , 2008, NIPS.

[106]  Adriana Kovashka,et al.  Learning a hierarchy of discriminative space-time neighborhood features for human action recognition , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[107]  James Bennett,et al.  The Netflix Prize , 2007 .

[108]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[109]  Tucker Balch,et al.  Making a Clean Sweep: Behavior Based Vacuuming , 1993 .

[110]  Yihong Gong,et al.  Training Hierarchical Feed-Forward Visual Recognition Models Using Transfer Learning from Pseudo-Tasks , 2008, ECCV.

[111]  Antonin Chambolle,et al.  A First-Order Primal-Dual Algorithm for Convex Problems with Applications to Imaging , 2011, Journal of Mathematical Imaging and Vision.