Distributionally Robust Graphical Models

In many structured prediction problems, complex relationships between variables are compactly defined using graphical structures. The most prevalent graphical prediction methods---probabilistic graphical models and large margin methods---have their own distinct strengths but also possess significant drawbacks. Conditional random fields (CRFs) are Fisher consistent, but they do not permit integration of customized loss metrics into their learning process. Large-margin models, such as structured support vector machines (SSVMs), have the flexibility to incorporate customized loss metrics, but lack Fisher consistency guarantees. We present adversarial graphical models (AGM), a distributionally robust approach for constructing a predictor that performs robustly for a class of data distributions defined using a graphical structure. Our approach enjoys both the flexibility of incorporating customized loss metrics into its design as well as the statistical guarantee of Fisher consistency. We present exact learning and prediction algorithms for AGM with time complexity similar to existing graphical models and show the practical benefits of our approach with experiments.

[1]  Sebastian Nowozin,et al.  Structured Learning and Prediction in Computer Vision , 2011, Found. Trends Comput. Graph. Vis..

[2]  Tong Zhang,et al.  Statistical Analysis of Some Multi-Category Large Margin Classification Methods , 2004, J. Mach. Learn. Res..

[3]  Daisy Zhe Wang,et al.  Automatic semantic edge labeling over legal citation graphs , 2018, Artificial Intelligence and Law.

[4]  Ben Taskar,et al.  Learning structured prediction models: a large margin approach , 2005, ICML.

[5]  David J. Spiegelhalter,et al.  Probabilistic Networks and Expert Systems - Exact Computational Methods for Bayesian Networks , 1999, Information Science and Statistics.

[6]  Jun'ichi Tsujii,et al.  Word Sense Disambiguation for All Words using Tree-Structured Conditional Random Fields , 2008, COLING.

[7]  A. Dawid,et al.  Game theory, maximum entropy, minimum discrepancy and robust Bayesian decision theory , 2004, math/0410076.

[8]  Ioannis Ch. Paschalidis,et al.  A Robust Learning Approach for Regression Models Based on Distributionally Robust Optimization , 2018, J. Mach. Learn. Res..

[9]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[10]  Yoram Singer,et al.  Efficient projections onto the l1-ball for learning in high dimensions , 2008, ICML '08.

[11]  E. Rowland Theory of Games and Economic Behavior , 1946, Nature.

[12]  Francis R. Bach,et al.  On Structured Prediction Theory with Calibrated Convex Surrogate Losses , 2017, NIPS.

[13]  Avrim Blum,et al.  Planning in the Presence of Cost Functions Controlled by an Adversary , 2003, ICML.

[14]  John C. Duchi,et al.  Variance-based Regularization with Convex Objectives , 2016, NIPS.

[15]  Hong Wang,et al.  Adversarial Prediction Games for Multivariate Losses , 2015, NIPS.

[16]  Xavier Carreras,et al.  Introduction to the CoNLL-2005 Shared Task: Semantic Role Labeling , 2005, CoNLL.

[17]  Hong Wang,et al.  Adversarial Sequence Tagging , 2016, IJCAI.

[18]  Brian D. Ziebart,et al.  Adversarial Multiclass Classification: A Risk Minimization Perspective , 2016, NIPS.

[19]  Brian D. Ziebart,et al.  Adversarial Cost-Sensitive Classification , 2015, UAI.

[20]  Ambuj Tewari,et al.  On the Consistency of Multiclass Classification Methods , 2007, J. Mach. Learn. Res..

[21]  Koby Crammer,et al.  A Simple Geometric Interpretation of SVM using Stochastic Adversaries , 2012, AISTATS.

[22]  Tommi S. Jaakkola,et al.  Introduction to dual composition for inference , 2011 .

[23]  Phil Blunsom,et al.  Semantic Role Labelling with Tree Conditional Random Fields , 2005, CoNLL.

[24]  Daniel Kuhn,et al.  Distributionally Robust Logistic Regression , 2015, NIPS.

[25]  Koby Crammer,et al.  On the Algorithmic Implementation of Multiclass Kernel-based Vector Machines , 2002, J. Mach. Learn. Res..

[26]  Flemming Topsøe,et al.  Information-theoretical optimization techniques , 1979, Kybernetika.

[27]  Wei Xing,et al.  ARC: Adversarial Robust Cuts for Semi-Supervised and Multi-label Classification , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[28]  Marco Cuturi,et al.  Sinkhorn Distances: Lightspeed Computation of Optimal Transport , 2013, NIPS.

[29]  Andrew McCallum,et al.  An Introduction to Conditional Random Fields , 2010, Found. Trends Mach. Learn..

[30]  Vladimir Pavlovic,et al.  Structured Output Ordinal Regression for Dynamic Facial Emotion Intensity Prediction , 2010, ECCV.

[31]  Thorsten Joachims,et al.  A support vector method for multivariate performance measures , 2005, ICML.

[32]  Brian D. Ziebart,et al.  Adversarial Surrogate Losses for Ordinal Regression , 2017, NIPS.

[33]  Gregory F. Cooper,et al.  The Computational Complexity of Probabilistic Inference Using Bayesian Belief Networks , 1990, Artif. Intell..

[34]  Yinyu Ye,et al.  Distributionally Robust Optimization Under Moment Uncertainty with Application to Data-Driven Problems , 2010, Oper. Res..

[35]  Stan Z. Li,et al.  Markov Random Field Modeling in Image Analysis , 2001, Computer Science Workbench.

[36]  Nianwen Xue,et al.  Calibrating Features for Semantic Role Labeling , 2004, EMNLP.

[37]  Thomas Hofmann,et al.  Large Margin Methods for Structured and Interdependent Output Variables , 2005, J. Mach. Learn. Res..

[38]  Yufeng Liu,et al.  Fisher Consistency of Multicategory Support Vector Machines , 2007, AISTATS.

[39]  Hinrich Schütze,et al.  Book Reviews: Foundations of Statistical Natural Language Processing , 1999, CL.

[40]  M. Sion On general minimax theorems , 1958 .

[41]  Lorenzo Rosasco,et al.  A Consistent Regularization Approach for Structured Prediction , 2016, NIPS.

[42]  Stephen P. Boyd,et al.  Notes on Decomposition Methods , 2008 .

[43]  Daniel Kuhn,et al.  Data-driven distributionally robust optimization using the Wasserstein metric: performance guarantees and tractable reformulations , 2015, Mathematical Programming.

[44]  John C. Duchi,et al.  Stochastic Gradient Methods for Distributionally Robust Optimization with f-divergences , 2016, NIPS.

[45]  Percy Liang,et al.  Fairness Without Demographics in Repeated Loss Minimization , 2018, ICML.