Learning Online Smooth Predictors for Realtime Camera Planning Using Recurrent Decision Trees

We study the problem of online prediction for realtime camera planning, where the goal is to predict smooth trajectories that correctly track and frame objects of interest (e.g., players in a basketball game). The conventional approach for training predictors does not directly consider temporal consistency, and often produces undesirable jitter. Although post-hoc smoothing (e.g., via a Kalman filter) can mitigate this issue to some degree, it is not ideal due to overly stringent modeling assumptions (e.g., Gaussian noise). We propose a recurrent decision tree framework that can directly incorporate temporal consistency into a data-driven predictor, as well as a learning algorithm that can efficiently learn such temporally smooth models. Our approach does not require any post-processing, making online smooth predictions much easier to generate when the noise model is unknown. We apply our approach to sports broadcasting: given noisy player detections, we learn where the camera should look based on human demonstrations. Our experiments exhibit significant improvements over conventional baselines and showcase the practicality of our approach.

[1]  Sung Yong Shin,et al.  General Construction of Time-Domain Filters for Orientation Data , 2002, IEEE Trans. Vis. Comput. Graph..

[2]  Luc Van Gool,et al.  Real-time facial feature detection using conditional regression forests , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[3]  Michael Gleicher,et al.  Content-preserving warps for 3D video stabilization , 2009, ACM Trans. Graph..

[4]  John Langford,et al.  Search-based structured prediction , 2009, Machine Learning.

[5]  Jianhui Chen,et al.  Autonomous Camera Systems: A Survey , 2014, WICED@AAAI.

[6]  Irfan A. Essa,et al.  Auto-directed video stabilization with robust L1 optimal camera paths , 2011, CVPR 2011.

[7]  Martin Nilsson Kalman Filtering with Unknown Noise Covariances , 2006 .

[8]  Rich Caruana,et al.  An empirical comparison of supervised learning algorithms , 2006, ICML.

[9]  C. Lawrence Zitnick,et al.  Structured Forests for Fast Edge Detection , 2013, 2013 IEEE International Conference on Computer Vision.

[10]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[11]  Yisong Yue,et al.  A Decision Tree Framework for Spatiotemporal Sequence Prediction , 2015, KDD.

[12]  Jianhui Chen,et al.  Mimicking Human Camera Operators , 2015, 2015 IEEE Winter Conference on Applications of Computer Vision.

[13]  Yaser Sheikh,et al.  Monocular Object Detection Using 3D Geometric Primitives , 2012, ECCV.

[14]  William W. Cohen,et al.  Stacked Sequential Learning , 2005, IJCAI.

[15]  Jonathan Schor,et al.  Detecting Social Actions of Fruit Flies , 2014, ECCV.

[16]  James Schimert,et al.  Coupling a Dynamic Linear Model with Random Forest Regression to Estimate Engine Wear , 2010 .

[17]  Irfan A. Essa,et al.  Detecting regions of interest in dynamic scenes with camera motions , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[18]  Philip H. S. Torr,et al.  Randomized trees for human pose detection , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[19]  Michael Gleicher,et al.  Content-preserving warps for 3D video stabilization , 2009, ACM Trans. Graph..

[20]  Antonio Criminisi,et al.  Decision Forests: A Unified Framework for Classification, Regression, Density Estimation, Manifold Learning and Semi-Supervised Learning , 2012, Found. Trends Comput. Graph. Vis..

[21]  Pieter Abbeel,et al.  Apprenticeship learning for helicopter control , 2009, CACM.

[22]  J. L. Roux An Introduction to the Kalman Filter , 2003 .

[23]  Thomas G. Dietterich Machine Learning for Sequential Data: A Review , 2002, SSPR/SPR.

[24]  Ragnhild Eg,et al.  The Cameraman Operating My Virtual Camera is Artificial , 2015, ACM Trans. Multim. Comput. Commun. Appl..

[25]  Michael Gleicher,et al.  Subspace video stabilization , 2011, TOGS.

[26]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[27]  Thorsten Joachims,et al.  Support Vector Training of Protein Alignment Models , 2007, RECOMB.

[28]  Geoffrey J. Gordon,et al.  A Reduction of Imitation Learning and Structured Prediction to No-Regret Online Learning , 2010, AISTATS.

[29]  Michael Collins,et al.  Discriminative Training Methods for Hidden Markov Models: Theory and Experiments with Perceptron Algorithms , 2002, EMNLP.

[30]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[31]  Pierre Gurdjos,et al.  Interactive Zoom and Panning from Live Panoramic Video , 2014, NOSSDAV 2014.

[32]  James J. Little,et al.  Using Line and Ellipse Features for Rectification of Broadcast Hockey Video , 2011, 2011 Canadian Conference on Computer and Robot Vision.

[33]  Antonio Criminisi,et al.  Filter Forests for Learning Data-Dependent Convolutional Kernels , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[34]  Pierre Geurts,et al.  Extremely randomized trees , 2006, Machine Learning.

[35]  Samy Bengio,et al.  Scheduled Sampling for Sequence Prediction with Recurrent Neural Networks , 2015, NIPS.

[36]  Thomas Hofmann,et al.  Hidden Markov Support Vector Machines , 2003, ICML.