Fall 2010 Colloquium

Structured prediction tasks pose a fundamental bias-computation trade-off: The need for complex models to increase predictive power on the one hand and the limited computational resources for inference in the exponentiallysized output spaces on the other. We formulate and develop structured prediction cascades to address this trade-off: a sequence of increasingly complex models that progressively filter the space of possible outputs. We represent an exponentially large set of filtered outputs using max marginals and propose a novel convex loss for learning cascades that balances filtering error with filtering efficiency. We derive generalization bounds for error and efficiency losses and evaluate our approach on several natural language and vision problems: handwriting recognition, part-of-speech tagging and articulated pose estimation in images and videos. We find that the learned cascades are capable of reducing the complexity of inference by up to several orders of magnitude, enabling the use of models which incorporate higher order dependencies and features and yield significantly higher accuracy.