Enabling more accurate and efficient structured prediction

Machine learning practitioners often face a fundamental trade-off between expressiveness and computation time: on average, more accurate, expressive models tend to be more computationally intensive both at training and test time. While this trade-off is always applicable, it is acutely present in the setting of structured prediction, where the joint prediction of multiple output variables often creates two primary, inter-related bottlenecks: inference and feature computation time. In this thesis, we address this trade-off at test-time by presenting frameworks that enable more accurate and efficient structured prediction by addressing each of the bottlenecks specifically. First, we develop a framework based on a cascade of models, where the goal is to control test-time complexity even as features are added that increase inference time (even exponentially). We call this framework Structured Prediction Cascades (SPC); we develop SPC in the context of exact inference and then extend the framework to handle the approximate case. Next, we develop a framework for the setting where the feature computation is explicitly the bottleneck, in which we learn to selectively evaluate features within an instance of the mode. This second framework is referred to as Dynamic Structured Model Selection (DMS), and is once again developed for a simpler, restricted model before being extended to handle a much more complex setting. For both cases, we evaluate our methods on several benchmark datasets, and we find that it is possible to dramatically improve the efficiency and accuracy of structured prediction.

[1]  Jason Weston,et al.  Label Embedding Trees for Large Multi-Class Tasks , 2010, NIPS.

[2]  D. Lindley On a Measure of the Information Provided by an Experiment , 1956 .

[3]  John Langford,et al.  Search-based structured prediction , 2009, Machine Learning.

[4]  Jordi Gonzàlez,et al.  A coarse-to-fine approach for fast deformable object detection , 2011, CVPR 2011.

[5]  Léon Bottou,et al.  The Tradeoffs of Large Scale Learning , 2007, NIPS.

[6]  Donald Geman,et al.  Coarse-to-Fine Face Detection , 2004, International Journal of Computer Vision.

[7]  Nuno Vasconcelos,et al.  Boosting Classifier Cascades , 2010, NIPS.

[8]  Michail G. Lagoudakis,et al.  Least-Squares Policy Iteration , 2003, J. Mach. Learn. Res..

[9]  Peter L. Bartlett,et al.  Model Selection and Error Estimation , 2000, Machine Learning.

[10]  He He,et al.  Imitation Learning by Coaching , 2012, NIPS.

[11]  Dimitri P. Bertsekas,et al.  Nonlinear Programming , 1997 .

[12]  Thomas Hofmann,et al.  Hidden Markov Support Vector Machines , 2003, ICML.

[13]  Shishir K. Shah,et al.  Joint Modeling of Algorithm Behavior and Image Quality for Algorithm Performance Prediction , 2010, BMVC.

[14]  Ben Taskar,et al.  Parsing human motion with stretchable models , 2011, CVPR 2011.

[15]  Valentina Bayer-Zubek Learning diagnostic policies from examples by systematic search , 2004, UAI 2004.

[16]  John Langford,et al.  Sparse Online Learning via Truncated Gradient , 2008, NIPS.

[17]  H. Akaike A new look at the statistical model identification , 1974 .

[18]  Peter L. Bartlett,et al.  Oracle inequalities for computationally budgeted model selection , 2011, COLT.

[19]  P. Massart,et al.  Risk bounds for model selection via penalization , 1999 .

[20]  Ben Taskar,et al.  Max-Margin Markov Networks , 2003, NIPS.

[21]  Ben Taskar,et al.  Dynamic Structured Model Selection , 2013, 2013 IEEE International Conference on Computer Vision.

[22]  Deva Ramanan,et al.  N-best maximal decoders for part models , 2011, 2011 International Conference on Computer Vision.

[23]  Shun-ichi Amari,et al.  A Theory of Pattern Recognition , 1968 .

[24]  Peter L. Bartlett,et al.  Rademacher and Gaussian Complexities: Risk Bounds and Structural Results , 2003, J. Mach. Learn. Res..