Online structured learning for real-time computer vision gaming applications

In recent years computer vision has played an increasingly important role in the development of computer games, and it now features as one of the core technologies for many gaming platforms. The work in this thesis addresses three problems in real-time computer vision, all of which are motivated by their potential application to computer games. We first present an approach for real-time 2D tracking of arbitrary objects. In common with recent research in this area we incorporate online learning to provide an appearance model which is able to adapt to the target object and its surrounding background during tracking. However, our approach moves beyond the standard framework of tracking using binary classification and instead integrates tracking and learning in a more principled way through the use of structured learning. As well as providing a more powerful framework for adaptive visual object tracking, our approach also outperforms state-of-the-art tracking algorithms on standard datasets. Next we consider the task of keypoint-based object tracking. We take the traditional pipeline of matching keypoints followed by geometric verification and show how this can be embedded into a structured learning framework in order to provide principled adaptivity to a given environment. We also propose an approximation method allowing us to take advantage of recently developed binary image descriptors, meaning our approach is suitable for real-time application even on low-powered portable devices. Experimentally, we clearly see the benefit that online adaptation using structured learning can bring to this problem. Finally, we present an approach for approximately recovering the dense 3D structure of a scene which has been mapped by a simultaneous localisation and mapping system. Our approach is guided by the constraints of the low-powered portable hardware we are targeting, and we develop a system which coarsely models the scene using a small number of planes. To achieve this, we frame the task as a structured prediction problem and introduce online learning into our approach to provide adaptivity to a given scene. This allows us to use relatively simple multi-view information coupled with online learning of appearance to efficiently produce coarse reconstructions of a scene.

[1]  Horst Bischof,et al.  Semi-supervised On-Line Boosting for Robust Tracking , 2008, ECCV.

[2]  Ping-Sing Tsai,et al.  Shape from Shading: A Survey , 1999, IEEE Trans. Pattern Anal. Mach. Intell..

[3]  Selim Benhimane,et al.  Real-time image-based tracking of planes using efficient second-order minimization , 2004, 2004 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (IEEE Cat. No.04CH37566).

[4]  Tom Drummond,et al.  Multiple Target Localisation at over 100 FPS , 2009, BMVC.

[5]  S. Sathiya Keerthi,et al.  Which Is the Best Multiclass SVM Method? An Empirical Study , 2005, Multiple Classifier Systems.

[6]  Daniel Cremers,et al.  Real-Time Dense Geometry from a Handheld Camera , 2010, DAGM-Symposium.

[7]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[8]  Tony Lindeberg,et al.  Scale-Space Theory in Computer Vision , 1993, Lecture Notes in Computer Science.

[9]  Bernhard P. Wrobel,et al.  Multiple View Geometry in Computer Vision , 2001 .

[10]  Christoph H. Lampert,et al.  Learning to Localize Objects with Structured Output Regression , 2008, ECCV.

[11]  Sebastian Nowozin,et al.  Structured Learning and Prediction in Computer Vision , 2011, Found. Trends Comput. Graph. Vis..

[12]  Andrew Blake,et al.  "GrabCut" , 2004, ACM Trans. Graph..

[13]  Zdenek Kalal,et al.  Tracking-Learning-Detection , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[14]  Shai Avidan,et al.  Support vector tracking , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[15]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[16]  R. Schapire The Strength of Weak Learnability , 1990, Machine Learning.

[17]  Yan Ke,et al.  PCA-SIFT: a more distinctive representation for local image descriptors , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[18]  David A. McAllester,et al.  Object Detection with Discriminatively Trained Part Based Models , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[19]  Dorin Comaniciu,et al.  Kernel-Based Object Tracking , 2003, IEEE Trans. Pattern Anal. Mach. Intell..

[20]  Vincent Lepetit,et al.  Keypoint recognition using randomized trees , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[21]  Horst Bischof,et al.  Robust Multi-View Boosting with Priors , 2010, ECCV.

[22]  Ethem Alpaydin,et al.  Multiple Kernel Learning Algorithms , 2011, J. Mach. Learn. Res..

[23]  Thomas G. Dietterich,et al.  Solving the Multiple Instance Problem with Axis-Parallel Rectangles , 1997, Artif. Intell..

[24]  Horst Bischof,et al.  Learning Features for Tracking , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[25]  Christopher G. Harris,et al.  A Combined Corner and Edge Detector , 1988, Alvey Vision Conference.

[26]  Horst Bischof,et al.  Real-Time Tracking via On-line Boosting , 2006, BMVC.

[27]  Takeo Kanade,et al.  An Iterative Image Registration Technique with an Application to Stereo Vision , 1981, IJCAI.

[28]  Judea Pearl,et al.  Probabilistic reasoning in intelligent systems - networks of plausible inference , 1991, Morgan Kaufmann series in representation and reasoning.

[29]  Ian D. Reid,et al.  Real-Time SLAM Relocalisation , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[30]  Roberto Cipolla,et al.  Using Multiple Hypotheses to Improve Depth-Maps for Multi-View Stereo , 2008, ECCV.

[31]  Dieter Schmalstieg,et al.  Pose tracking from natural features on mobile phones , 2008, 2008 7th IEEE/ACM International Symposium on Mixed and Augmented Reality.

[32]  Thorsten Joachims,et al.  Cutting-plane training of structural SVMs , 2009, Machine Learning.

[33]  Koby Crammer,et al.  Multi-Class Pegasos on a Budget , 2010, ICML.

[34]  Jiri Matas,et al.  Robust wide-baseline stereo from maximally stable extremal regions , 2004, Image Vis. Comput..

[35]  Andrew Blake,et al.  A sparse probabilistic learning algorithm for real-time tracking , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[36]  Michael Isard,et al.  Object retrieval with large vocabularies and fast spatial matching , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[37]  David J. Fleet,et al.  Robust Online Appearance Models for Visual Tracking , 2003, IEEE Trans. Pattern Anal. Mach. Intell..

[38]  Tom Drummond,et al.  Machine Learning for High-Speed Corner Detection , 2006, ECCV.

[39]  J. M. Hammersley,et al.  Markov fields on finite graphs and lattices , 1971 .

[40]  Paul A. Viola,et al.  Robust Real-Time Face Detection , 2001, International Journal of Computer Vision.

[41]  Thorsten Joachims,et al.  Making large scale SVM learning practical , 1998 .

[42]  Andrew Zisserman,et al.  Long Term Arm and Hand Tracking for Continuous Sign Language TV Broadcasts , 2008, BMVC.

[43]  Gregory D. Hager,et al.  Efficient Region Tracking With Parametric Models of Geometry and Illumination , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[44]  Andrew Zisserman,et al.  MLESAC: A New Robust Estimator with Application to Estimating Image Geometry , 2000, Comput. Vis. Image Underst..

[45]  Peter Norvig,et al.  The Unreasonable Effectiveness of Data , 2009, IEEE Intelligent Systems.

[46]  Pierre Vandergheynst,et al.  FREAK: Fast Retina Keypoint , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[47]  Horst Bischof,et al.  On-line semi-supervised multiple-instance boosting , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[48]  Koby Crammer,et al.  Online Classification on a Budget , 2003, NIPS.

[49]  Ming-Hsuan Yang,et al.  Visual tracking with online Multiple Instance Learning , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[50]  Andrew Zisserman,et al.  Video Google: a text retrieval approach to object matching in videos , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[51]  David Nistér,et al.  Scalable Recognition with a Vocabulary Tree , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[52]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[53]  Andrew J. Davison,et al.  DTAM: Dense tracking and mapping in real-time , 2011, 2011 International Conference on Computer Vision.

[54]  Larry D. Hostetler,et al.  The estimation of the gradient of a density function, with applications in pattern recognition , 1975, IEEE Trans. Inf. Theory.

[55]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[56]  M. Aizerman,et al.  Theoretical Foundations of the Potential Function Method in Pattern Recognition Learning , 1964 .

[57]  Cordelia Schmid,et al.  Local Grayvalue Invariants for Image Retrieval , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[58]  Anton van den Hengel,et al.  Interactive modelling for AR applications , 2010, 2010 IEEE International Symposium on Mixed and Augmented Reality.

[59]  Tom Drummond,et al.  ProFORMA: Probabilistic Feature-based On-line Rapid Model Acquisition , 2009, BMVC.

[60]  Stuart J. Russell,et al.  Online bagging and boosting , 2005, 2005 IEEE International Conference on Systems, Man and Cybernetics.

[61]  Horst Bischof,et al.  On robustness of on-line boosting - a competitive study , 2009, 2009 IEEE 12th International Conference on Computer Vision Workshops, ICCV Workshops.

[62]  J.-Y. Bouguet,et al.  Pyramidal implementation of the lucas kanade feature tracker , 1999 .

[63]  David G. Lowe,et al.  Fast Approximate Nearest Neighbors with Automatic Algorithm Configuration , 2009, VISAPP.

[64]  G. Klein,et al.  Parallel Tracking and Mapping for Small AR Workspaces , 2007, 2007 6th IEEE and ACM International Symposium on Mixed and Augmented Reality.

[65]  Klaus-Robert Müller,et al.  Efficient BackProp , 2012, Neural Networks: Tricks of the Trade.

[66]  Roland Siegwart,et al.  BRISK: Binary Robust invariant scalable keypoints , 2011, 2011 International Conference on Computer Vision.

[67]  J. E. Kelley,et al.  The Cutting-Plane Method for Solving Convex Programs , 1960 .

[68]  Luc Van Gool,et al.  Speeded-Up Robust Features (SURF) , 2008, Comput. Vis. Image Underst..

[69]  Horst Bischof,et al.  Online multi-class LPBoost , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[70]  Kevin Cannons,et al.  A Review of Visual Tracking , 2008 .

[71]  Ben Taskar,et al.  Max-Margin Markov Networks , 2003, NIPS.

[72]  Vladimir Kolmogorov,et al.  An experimental comparison of min-cut/max- flow algorithms for energy minimization in vision , 2001, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[73]  Antonio Criminisi,et al.  TextonBoost for Image Understanding: Multi-Class Object Recognition and Segmentation by Jointly Modeling Texture, Layout, and Context , 2007, International Journal of Computer Vision.

[74]  Sanjiv Kumar,et al.  Discriminative Random Fields , 2006, International Journal of Computer Vision.

[75]  Bernhard Schölkopf,et al.  A Generalized Representer Theorem , 2001, COLT/EuroCOLT.

[76]  Jason Weston,et al.  Fast Kernel Classifiers with Online and Active Learning , 2005, J. Mach. Learn. Res..

[77]  Tony Lindeberg,et al.  Feature Detection with Automatic Scale Selection , 1998, International Journal of Computer Vision.

[78]  Cordelia Schmid,et al.  Scale & Affine Invariant Interest Point Detectors , 2004, International Journal of Computer Vision.

[79]  Philip H. S. Torr,et al.  Efficient piecewise learning for conditional random fields , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[80]  Vladimir Kolmogorov,et al.  What energy functions can be minimized via graph cuts? , 2002, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[81]  Richard Szeliski,et al.  A Comparison and Evaluation of Multi-View Stereo Reconstruction Algorithms , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[82]  S. Süsstrunk,et al.  SLIC Superpixels ? , 2010 .

[83]  Takahiro Ishikawa,et al.  The template update problem , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[84]  Michael Isard,et al.  CONDENSATION—Conditional Density Propagation for Visual Tracking , 1998, International Journal of Computer Vision.

[85]  Ehud Rivlin,et al.  Robust Fragments-based Tracking using the Integral Histogram , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[86]  Horst Bischof,et al.  MIForests: Multiple-Instance Learning with Randomized Trees , 2010, ECCV.

[87]  Ming-Hsuan Yang,et al.  Incremental Learning for Robust Visual Tracking , 2008, International Journal of Computer Vision.

[88]  Anton Osokin,et al.  Fast Approximate Energy Minimization with Label Costs , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[89]  Carlos Hernández,et al.  Video-based, real-time multi-view stereo , 2011, Image Vis. Comput..

[90]  Olivier Stasse,et al.  MonoSLAM: Real-Time Single Camera SLAM , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[91]  Luc Van Gool,et al.  Wide Baseline Stereo Matching based on Local, Affinely Invariant Regions , 2000, BMVC.

[92]  Vincent Lepetit,et al.  Fast Keypoint Recognition Using Random Ferns , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[93]  Horst Bischof,et al.  On-line Random Forests , 2009, 2009 IEEE 12th International Conference on Computer Vision Workshops, ICCV Workshops.

[94]  Michael J. Black,et al.  EigenTracking: Robust Matching and Tracking of Articulated Objects Using a View-Based Representation , 1996, International Journal of Computer Vision.

[95]  David A. Forsyth,et al.  Tracking People by Learning Their Appearance , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[96]  Nuno Vasconcelos,et al.  On the design of robust classifiers for computer vision , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[97]  Andrew J. Davison,et al.  Live dense reconstruction with a single moving camera , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[98]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[99]  Larry S. Davis,et al.  Efficient mean-shift tracking via a new similarity measure , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[100]  Yoram Singer,et al.  Pegasos: primal estimated sub-gradient solver for SVM , 2011, Math. Program..

[101]  K. Schittkowski,et al.  NONLINEAR PROGRAMMING , 2022 .

[102]  Andrew Zisserman,et al.  Multiple kernels for object detection , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[103]  Larry S. Davis,et al.  Probabilistic tracking in joint feature-spatial spaces , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[104]  Christoph H. Lampert,et al.  Efficient Subwindow Search: A Branch and Bound Framework for Object Localization , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[105]  Derek Hoiem,et al.  Learning CRFs Using Graph Cuts , 2008, ECCV.

[106]  Richard Szeliski,et al.  Piecewise planar stereo for image-based rendering , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[107]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[108]  Cordelia Schmid,et al.  Vector Quantizing Feature Space with a Regular Lattice , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[109]  Robert C. Bolles,et al.  Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography , 1981, CACM.

[110]  Alexei A. Efros,et al.  Recovering Surface Layout from an Image , 2007, International Journal of Computer Vision.

[111]  Wei Xu,et al.  Towards Optimal One Pass Large Scale Learning with Averaged Stochastic Gradient Descent , 2011, ArXiv.

[112]  Vincent Lepetit,et al.  BRIEF: Binary Robust Independent Elementary Features , 2010, ECCV.

[113]  Marie-Pierre Jolly,et al.  Interactive graph cuts for optimal boundary & region segmentation of objects in N-D images , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[114]  Thomas G. Dietterich An Experimental Comparison of Three Methods for Constructing Ensembles of Decision Trees: Bagging, Boosting, and Randomization , 2000, Machine Learning.

[115]  Carlo Tomasi,et al.  Good features to track , 1994, 1994 Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.

[116]  Roberto Cipolla,et al.  Semantic texton forests for image categorization and segmentation , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[117]  Olga Veksler,et al.  Fast approximate energy minimization via graph cuts , 2001, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[118]  Vincent Lepetit,et al.  Feature Harvesting for Tracking-by-Detection , 2006, ECCV.

[119]  Boris Polyak,et al.  Acceleration of stochastic approximation by averaging , 1992 .

[120]  Jiri Matas,et al.  Matching with PROSAC - progressive sample consensus , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[121]  Gary R. Bradski,et al.  ORB: An efficient alternative to SIFT or SURF , 2011, 2011 International Conference on Computer Vision.

[122]  Ian D. Reid,et al.  Robust Real-Time Visual Tracking Using Pixel-Wise Posteriors , 2008, ECCV.

[123]  Michael Calonder Robust, High-Speed Interest Point Matching for Real-Time Applications , 2010 .

[124]  Simon Baker,et al.  Lucas-Kanade 20 Years On: A Unifying Framework , 2004, International Journal of Computer Vision.

[125]  Jason Weston,et al.  Solving multiclass support vector machines with LaRank , 2007, ICML '07.

[126]  Jan-Michael Frahm,et al.  Piecewise planar and non-planar stereo for urban scene reconstruction , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[127]  Sebastian Nowozin,et al.  On feature combination for multiclass object classification , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[128]  Thomas Hofmann,et al.  Large Margin Methods for Structured and Interdependent Output Variables , 2005, J. Mach. Learn. Res..

[129]  Yuri Boykov,et al.  Energy-Based Geometric Multi-model Fitting , 2012, International Journal of Computer Vision.

[130]  Antoine Bordes,et al.  Sequence Labelling SVMs Trained in One Pass , 2008, ECML/PKDD.