Predict and Constrain: Modeling Cardinality in Deep Structured Prediction

Many machine learning problems require the prediction of multi-dimensional labels. Such structured prediction models can benefit from modeling dependencies between labels. Recently, several deep learning approaches to structured prediction have been proposed. Here we focus on capturing cardinality constraints in such models. Namely, constraining the number of non-zero labels that the model outputs. Such constraints have proven very useful in previous structured prediction approaches, but it is a challenge to introduce them into a deep learning framework. Here we show how to do this via a novel deep architecture. Our approach outperforms strong baselines, achieving state-of-the-art results on multi-label classification benchmarks.

[1]  Andrew McCallum,et al.  Structured Prediction Energy Networks , 2015, ICML.

[2]  Ben Taskar,et al.  Posterior Regularization for Structured Latent Variable Models , 2010, J. Mach. Learn. Res..

[3]  Philip H. S. Torr,et al.  A Projected Gradient Descent Method for CRF Inference Allowing End-to-End Training of Arbitrary Pairwise Potentials , 2017, EMMCVPR.

[4]  Yoram Singer,et al.  Efficient projections onto the l1-ball for learning in high dimensions , 2008, ICML '08.

[5]  Marcin Andrychowicz,et al.  Learning to learn by gradient descent by gradient descent , 2016, NIPS.

[6]  Andrew McCallum,et al.  End-to-End Learning for Structured Prediction Energy Networks , 2017, ICML.

[7]  Richard S. Zemel,et al.  HOP-MAP: Efficient Message Passing with High Order Potentials , 2010, AISTATS.

[8]  Rahul Gupta,et al.  Efficient inference with cardinality-based clique potentials , 2007, ICML '07.

[9]  David Pfau,et al.  Unrolled Generative Adversarial Networks , 2016, ICLR.

[10]  Alan L. Yuille,et al.  Learning Deep Structured Models , 2014, ICML.

[11]  Vibhav Vineet,et al.  Conditional Random Fields as Recurrent Neural Networks , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[12]  Jing Xiao,et al.  L1 Projections with Box Constraints , 2010, ArXiv.

[13]  Eduard H. Hovy,et al.  End-to-end Sequence Labeling via Bi-directional LSTM-CNNs-CRF , 2016, ACL.

[14]  J. Zico Kolter,et al.  Input-Convex Deep Networks , 2016 .

[15]  R. Dykstra,et al.  A Method for Finding Projections onto the Intersection of Convex Sets in Hilbert Spaces , 1986 .

[16]  Brendan J. Frey,et al.  Fast Exact Inference for Recursive Cardinality Models , 2012, UAI.

[17]  Raquel Urtasun,et al.  Fully Connected Deep Structured Networks , 2015, ArXiv.

[18]  Ryan P. Adams,et al.  Cardinality Restricted Boltzmann Machines , 2012, NIPS.

[19]  Leslie Pack Kaelbling,et al.  Lifted Probabilistic Inference with Counting Formulas , 2008, AAAI.

[20]  Jürgen Schmidhuber,et al.  Highway and Residual Networks learn Unrolled Iterative Estimation , 2016, ICLR.

[21]  M. Raydan,et al.  Alternating Projection Methods , 2011 .

[22]  Wang Yan,et al.  Visual recognition by counting instances: A multi-instance cardinality potential kernel , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[23]  Ryan P. Adams,et al.  Gradient-based Hyperparameter Optimization through Reversible Learning , 2015, ICML.

[24]  Mohammad Norouzi,et al.  Deep Value Networks Learn to Evaluate and Iteratively Refine Structured Outputs , 2017, ICML.

[25]  Yoram Singer,et al.  Adaptive Subgradient Methods for Online Learning and Stochastic Optimization , 2011, J. Mach. Learn. Res..