A Primal Dual Formulation For Deep Learning With Constraints

For several problems of interest, there are natural constraints which exist over the output label space. For example, for the joint task of NER and POS labeling, these constraints might specify that the NER label ‘organization’ is consistent only with the POS labels ‘noun’ and ‘preposition’. These constraints can be a great way of injecting prior knowledge into a deep learning model, thereby improving overall performance. In this paper, we present a constrained optimization formulation for training a deep network with a given set of hard constraints on output labels. Our novel approach first converts the label constraints into soft logic constraints over probability distributions outputted by the network. It then converts the constrained optimization problem into an alternating min-max optimization with Lagrangian variables defined for each constraint. Since the constraints are independent of the target labels, our framework easily generalizes to semi-supervised setting. We experiment on the tasks of Semantic Role Labeling (SRL), Named Entity Recognition (NER) tagging, and fine-grained entity typing and show that our constraints not only significantly reduce the number of constraint violations, but can also result in state-of-the-art performance

[1]  Vilém Novák,et al.  First-order fuzzy logic , 1987, Stud Logica.

[2]  Guillaume Lample,et al.  Neural Architectures for Named Entity Recognition , 2016, NAACL.

[3]  Yuchen Zhang,et al.  CoNLL-2012 Shared Task: Modeling Multilingual Unrestricted Coreference in OntoNotes , 2012, EMNLP-CoNLL Shared Task.

[4]  Michael I. Jordan,et al.  Minmax Optimization: Stable Limit Points of Gradient Descent Ascent are Locally Optimal , 2019, ArXiv.

[5]  Alexander M. Rush,et al.  A Tutorial on Dual Decomposition and Lagrangian Relaxation for Inference in Natural Language Processing , 2012, J. Artif. Intell. Res..

[6]  Ben Taskar,et al.  Posterior Regularization for Structured Latent Variable Models , 2010, J. Mach. Learn. Res..

[7]  Dan Roth,et al.  The Importance of Syntactic Parsing and Inference in Semantic Role Labeling , 2008, CL.

[8]  Thomas Pock,et al.  End-to-End Training of Hybrid CNN-CRF Models for Stereo , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Johan Bos,et al.  The Groningen Meaning Bank , 2013, JSSP.

[10]  Jaime G. Carbonell,et al.  Towards Semi-Supervised Learning for Deep Semantic Role Labeling , 2018, EMNLP.

[11]  Sanket Vaibhav Mehta,et al.  Gradient-Based Inference for Networks with Output Constraints , 2017, AAAI.

[12]  George A. Miller,et al.  WordNet: A Lexical Database for English , 1995, HLT.

[13]  Dan Roth,et al.  Integer linear programming inference for conditional random fields , 2005, ICML.

[14]  Dan Roth,et al.  A Constrained Latent Variable Model for Coreference Resolution , 2013, EMNLP.

[15]  Wei Xu,et al.  Bidirectional LSTM-CRF Models for Sequence Tagging , 2015, ArXiv.

[16]  Andrew McCallum,et al.  Universal schema for entity type prediction , 2013, AKBC '13.

[17]  Nir Friedman,et al.  Probabilistic Graphical Models: Principles and Techniques - Adaptive Computation and Machine Learning , 2009 .

[18]  Andrew McCallum,et al.  Finer Grained Entity Typing with TypeNet , 2017, AKBC@NIPS.

[19]  Jianshu Chen,et al.  A Primal-Dual Method for Training Recurrent Neural Networks Constrained by the Echo-State Property , 2013 .

[20]  Marco Gori,et al.  Semantic-based regularization for learning and inference , 2017, Artif. Intell..

[21]  Guy Van den Broeck,et al.  A Semantic Loss Function for Deep Learning with Symbolic Knowledge , 2017, ICML.

[22]  Iasonas Kokkinos,et al.  DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[23]  Pascal Fua,et al.  Imposing Hard Constraints on Deep Networks: Promises and Limitations , 2017, CVPR 2017.

[24]  Andrew McCallum,et al.  Hierarchical Losses and New Resources for Fine-grained Entity Typing and Linking , 2018, ACL.

[25]  Marco Gori,et al.  Integrating Prior Knowledge into Deep Learning , 2017, 2017 16th IEEE International Conference on Machine Learning and Applications (ICMLA).

[26]  Lise Getoor,et al.  Probabilistic Similarity Logic , 2010, UAI.

[27]  Andrew McCallum,et al.  Generalizing to Unseen Entities and Entity Pairs with Row-less Universal Schema , 2016, EACL.

[28]  Eric P. Xing,et al.  Harnessing Deep Neural Networks with Logic Rules , 2016, ACL.

[29]  Praveen Paritosh,et al.  Freebase: a collaboratively created graph database for structuring human knowledge , 2008, SIGMOD Conference.