The Limited Multi-Label Projection Layer

We propose the Limited Multi-Label (LML) projection layer as a new primitive operation for end-to-end learning systems. The LML layer provides a probabilistic way of modeling multi-label predictions limited to having exactly k labels. We derive efficient forward and backward passes for this layer and show how the layer can be used to optimize the top-k recall for multi-label tasks with incomplete label information. We evaluate LML layers on top-k CIFAR-100 classification and scene graph generation. We show that LML layers add a negligible amount of computational overhead, strictly improve the model's representational capacity, and improve accuracy. We also revisit the truncated top-k entropy method as a competitive baseline for top-k classification.

[1]  J. Zico Kolter,et al.  OptNet: Differentiable Optimization as a Layer in Neural Networks , 2017, ICML.

[2]  André F. T. Martins,et al.  Fast and Robust Compressive Summarization with Dual Decomposition and Multi-Task Learning , 2013, ACL.

[3]  Brendan J. Frey,et al.  Fast Exact Inference for Recursive Cardinality Models , 2012, UAI.

[4]  Andrew Zisserman,et al.  Smooth Loss Functions for Deep Top-k Classification , 2018, ICLR.

[5]  In-So Kweon,et al.  LinkNet: Relational Embedding for Scene Graph , 2018, NeurIPS.

[6]  Mathieu Blondel,et al.  Structured Prediction with Projection Oracles , 2019, NeurIPS.

[7]  Xiaogang Wang,et al.  Factorizable Net: An Efficient Subgraph-based Framework for Scene Graph Generation , 2018, ECCV.

[8]  Maya R. Gupta,et al.  Training highly multiclass classifiers , 2014, J. Mach. Learn. Res..

[9]  Andrew McCallum,et al.  Structured Prediction Energy Networks , 2015, ICML.

[10]  Cynthia Rudin,et al.  The P-Norm Push: A Simple Convex Ranking Algorithm that Concentrates at the Top of the List , 2009, J. Mach. Learn. Res..

[11]  Jonathan Berant,et al.  Mapping Images to Scene Graphs with Permutation-Invariant Structured Prediction , 2018, NeurIPS.

[12]  Svetlana Lazebnik,et al.  Phrase Localization and Visual Relationship Detection with Comprehensive Image-Language Cues , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).

[13]  Bernt Schiele,et al.  Top-k Multiclass SVM , 2015, NIPS.

[14]  Rong Jin,et al.  Top Rank Optimization in Linear Time , 2014, NIPS.

[15]  Ryan P. Adams,et al.  Ranking via Sinkhorn Propagation , 2011, ArXiv.

[16]  Shivani Agarwal,et al.  The Infinite Push: A New Support Vector Ranking Algorithm that Directly Optimizes Accuracy at the Absolute Top of the List , 2011, SDM.

[17]  André F. T. Martins,et al.  Learning with Fenchel-Young Losses , 2020, J. Mach. Learn. Res..

[18]  Anoop Cherian,et al.  Visual Permutation Learning , 2019, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[19]  Claire Cardie,et al.  SparseMAP: Differentiable Sparse Structured Inference , 2018, ICML.

[20]  Yejin Choi,et al.  Neural Motifs: Scene Graph Parsing with Global Context , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[21]  Alain Rakotomamonjy,et al.  Sparse Support Vector Infinite Push , 2012, ICML.

[22]  Bodo Rosenhahn,et al.  On Support Relations and Semantic Scene Graphs , 2016, ArXiv.

[23]  Danfei Xu,et al.  Scene Graph Generation by Iterative Message Passing , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[24]  Eric P. Xing,et al.  Deep Variation-Structured Reinforcement Learning for Visual Relationship and Attribute Detection , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[25]  Scott W. Linderman,et al.  Learning Latent Permutations with Gumbel-Sinkhorn Networks , 2018, ICLR.

[26]  Stephen P. Boyd,et al.  Accuracy at the Top , 2012, NIPS.

[27]  Ramón Fernández Astudillo,et al.  From Softmax to Sparsemax: A Sparse Model of Attention and Multi-Label Classification , 2016, ICML.

[28]  Amir Globerson,et al.  Predict and Constrain: Modeling Cardinality in Deep Structured Prediction , 2018, ICML.

[29]  Jia Deng,et al.  Pixels to Graphs by Associative Embedding , 2017, NIPS.

[30]  Jean-Charles Régin,et al.  Generalized Arc Consistency for Global Cardinality Constraint , 1996, AAAI/IAAI, Vol. 1.

[31]  Kilian Q. Weinberger,et al.  Densely Connected Convolutional Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[32]  Razvan Pascanu,et al.  Discovering objects and their relations from entangled scene representations , 2017, ICLR.

[33]  Bernt Schiele,et al.  Loss Functions for Top-k Error: Analysis and Insights , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[34]  André F. T. Martins,et al.  Learning What’s Easy: Fully Differentiable Neural Easy-First Taggers , 2017, EMNLP.

[35]  André F. T. Martins,et al.  Sparse and Constrained Attention for Neural Machine Translation , 2018, ACL.

[36]  Michael S. Bernstein,et al.  Image retrieval using scene graphs , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[37]  Thomas G. Dietterich,et al.  Transductive Optimization of Top k Precision , 2015, IJCAI.

[38]  Michael S. Bernstein,et al.  Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations , 2016, International Journal of Computer Vision.

[39]  Fernando Pereira,et al.  Collective Entity Resolution with Multi-Focal Attention , 2016, ACL.