Sparse conditional copula models for structured output regression

We deal with the multiple output regression task where the central theme is to capture the sparse output correlation among the output variables. Sparse inverse covariance learning of linear Gaussian conditional models has been recently studied, shown to achieve superb prediction performance. However, it can fail when the underlying true input-output process is non-Gaussian and/or non-linear. We introduce a novel sparse conditional copula model to represent the joint density of the output variables. By incorporating a Gaussian copula function, yet modeling univariate marginal densities by (non-Gaussian) mixtures of experts, we achieve high flexibility in representation that admits non-linear and non-Gaussian densities. We then propose a sparse learning method for this copula-based model that effectively imposes sparsity in the conditional dependency among output variables. The learning optimization is efficient as it can be decomposed into gradient-descent marginal density estimation and the sparse inverse covariance learning for the copula function. Improved performance of the proposed approach is demonstrated on several interesting image/vision tasks with high dimensions. HighlightsSparse non-linear, non-Gaussian density modeling by conditional copula.Loose output correlation estimation by sparse copula inverse covariance learning.Efficient alternating optimization method for marginals and copula.Superior to existing multiple output regression methods on several datasets.

[1]  Michael I. Jordan,et al.  On Discriminative vs. Generative Classifiers: A comparison of logistic regression and naive Bayes , 2001, NIPS.

[2]  Alexandre d'Aspremont,et al.  Model Selection Through Sparse Max Likelihood Estimation Model Selection Through Sparse Maximum Likelihood Estimation for Multivariate Gaussian or Binary Data , 2022 .

[3]  Bin Shen,et al.  Structural Extension to Logistic Regression: Discriminative Parameter Learning of Belief Net Classifiers , 2002, Machine Learning.

[4]  Marc Teboulle,et al.  A Fast Iterative Shrinkage-Thresholding Algorithm for Linear Inverse Problems , 2009, SIAM J. Imaging Sci..

[5]  Bodo Rosenhahn,et al.  Analyzing and Evaluating Markerless Motion Tracking Using Inertial Sensors , 2010, ECCV Workshops.

[6]  P. Embrechts,et al.  Risk Management: Correlation and Dependence in Risk Management: Properties and Pitfalls , 2002 .

[7]  H. Künsch Gaussian Markov random fields , 1979 .

[8]  R. Tibshirani,et al.  Sparse inverse covariance estimation with the graphical lasso. , 2008, Biostatistics.

[9]  Rui Li,et al.  Articulated Pose Estimation in a Learned Smooth Space of Feasible Solutions , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05) - Workshops.

[10]  P. X. Song,et al.  Multivariate Dispersion Models Generated From Gaussian Copula , 2000 .

[11]  Andrew McCallum,et al.  An Introduction to Conditional Random Fields , 2010, Found. Trends Mach. Learn..

[12]  P. Friederichs,et al.  Multivariate non-normally distributed random variables in climate research - introduction to the copula approach , 2008 .

[13]  H. Akaike A new look at the statistical model identification , 1974 .

[14]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[15]  Leonhard Held,et al.  Gaussian Markov Random Fields: Theory and Applications , 2005 .

[16]  E. Luciano,et al.  Copula methods in finance , 2004 .

[17]  Andrew McCallum,et al.  Maximum Entropy Markov Models for Information Extraction and Segmentation , 2000, ICML.

[18]  Xiao-Tong Yuan,et al.  Partial Gaussian Graphical Model Estimation , 2012, IEEE Transactions on Information Theory.

[19]  J. Zico Kolter,et al.  Sparse Gaussian Conditional Random Fields: Algorithms, Theory, and Application to Energy Forecasting , 2013, ICML.

[20]  Robert A. Jacobs,et al.  Hierarchical Mixtures of Experts and the EM Algorithm , 1993, Neural Computation.

[21]  Kyung-Ah Sohn,et al.  Joint Estimation of Structured Sparsity and Output Structure in Multiple-Output Regression via Inverse-Covariance Regularization , 2012, AISTATS.

[22]  Hongzhe Li,et al.  A SPARSE CONDITIONAL GAUSSIAN GRAPHICAL MODEL FOR ANALYSIS OF GENETICAL GENOMICS DATA. , 2011, The annals of applied statistics.

[23]  Mark W. Schmidt,et al.  Fast Optimization Methods for L1 Regularization: A Comparative Study and Two New Approaches , 2007, ECML.

[24]  Christopher M. Bishop,et al.  Bayesian Hierarchical Mixtures of Experts , 2002, UAI.

[25]  R. Nelsen An Introduction to Copulas , 1998 .

[26]  Francis R. Bach,et al.  Structured Variable Selection with Sparsity-Inducing Norms , 2009, J. Mach. Learn. Res..

[27]  Jorge Nocedal,et al.  Newton-Like Methods for Sparse Inverse Covariance Estimation , 2012, NIPS.

[28]  R. Nelsen An Introduction to Copulas (Springer Series in Statistics) , 2006 .

[29]  Franz Pernkopf,et al.  Discriminative versus generative parameter and structure learning of Bayesian network classifiers , 2005, ICML.

[30]  Harris Drucker,et al.  Comparison of learning algorithms for handwritten digit recognition , 1995 .

[31]  H. Joe Multivariate Models and Multivariate Dependence Concepts , 1997 .

[32]  Bodo Rosenhahn,et al.  Multisensor-fusion for 3D full-body human motion capture , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[33]  Roger T Kilgore,et al.  Estimating Joint Flow Probabilities at Stream Confluences by Using Copulas , 2011 .

[34]  Nir Friedman,et al.  Probabilistic Graphical Models - Principles and Techniques , 2009 .

[35]  Michael I. Jordan,et al.  Union support recovery in high-dimensional multivariate regression , 2008, 2008 46th Annual Allerton Conference on Communication, Control, and Computing.

[36]  M. Sklar Fonctions de repartition a n dimensions et leurs marges , 1959 .

[37]  Pradeep Ravikumar,et al.  Sparse inverse covariance matrix estimation using quadratic approximation , 2011, MLSLP.

[38]  E. Xing,et al.  Statistical Estimation of Correlated Genome Associations to a Quantitative Trait Network , 2009, PLoS genetics.

[39]  J. M. Hammersley,et al.  Markov fields on finite graphs and lattices , 1971 .

[40]  Jianfeng Gao,et al.  Scalable training of L1-regularized log-linear models , 2007, ICML '07.