Simultaneous Parameter Learning and Bi-clustering for Multi-Response Models

We consider multi-response and multitask regression models, where the parameter matrix to be estimated is expected to have an unknown grouping structure. The groupings can be along tasks, or features, or both, the last one indicating a bi-cluster or "checkerboard" structure. Discovering this grouping structure along with parameter inference makes sense in several applications, such as multi-response Genome-Wide Association Studies. This additional structure can not only can be leveraged for more accurate parameter estimation, but it also provides valuable information on the underlying data mechanisms (e.g. relationships among genotypes and phenotypes in GWAS). In this paper, we propose two formulations to simultaneously learn the parameter matrix and its group structures, based on convex regularization penalties. We present optimization approaches to solve the resulting problems and provide numerical convergence guarantees. Our approaches are validated on extensive simulations and real datasets concerning phenotypes and genotypes of plant varieties.

[1]  Rich Caruana,et al.  Multitask Learning , 1997, Machine-mediated learning.

[2]  Ali Jalali,et al.  A Dirty Model for Multi-task Learning , 2010, NIPS.

[3]  Adel Javanmard,et al.  Confidence intervals and hypothesis testing for high-dimensional regression , 2013, J. Mach. Learn. Res..

[4]  R. Tibshirani,et al.  The solution path of the generalized lasso , 2010, 1005.1971.

[5]  Han Liu,et al.  A General Theory of Hypothesis Tests and Confidence Regions for Sparse High Dimensional Models , 2014, 1412.8765.

[6]  S. Geer,et al.  On asymptotically optimal confidence regions and tests for high-dimensional models , 2013, 1303.0518.

[7]  Stephen P. Boyd,et al.  Network Lasso: Clustering and Optimization in Large Graphs , 2015, KDD.

[8]  Michael I. Jordan,et al.  Multi-task feature selection , 2006 .

[9]  Amir Beck,et al.  On the Convergence of Alternating Minimization for Convex Programming with Applications to Iteratively Reweighted Least Squares and Decomposition Schemes , 2015, SIAM J. Optim..

[10]  N. Meinshausen,et al.  LASSO-TYPE RECOVERY OF SPARSE REPRESENTATIONS FOR HIGH-DIMENSIONAL DATA , 2008, 0806.0145.

[11]  Hal Daumé,et al.  Learning Task Grouping and Overlap in Multi-task Learning , 2012, ICML.

[12]  David C Christiani,et al.  Genome-wide association analysis for multiple continuous secondary phenotypes. , 2013, American journal of human genetics.

[13]  Genevera I. Allen,et al.  Convex biclustering , 2014, Biometrics.

[14]  Brian McWilliams,et al.  LOCO: Distributing Ridge Regression with Random Projections , 2014, 1406.3469.

[15]  Zhaoran Wang,et al.  Nonconvex Statistical Optimization: Minimax-Optimal Sparse PCA in Polynomial Time , 2014, ArXiv.

[16]  Jean-Philippe Vert,et al.  Clustered Multi-Task Learning: A Convex Formulation , 2008, NIPS.

[17]  S. Rosset,et al.  Piecewise linear regularized solution paths , 2007, 0708.2197.

[18]  L. Hubert,et al.  Comparing partitions , 1985 .

[19]  Ming Yu,et al.  Multitask Learning using Task Clustering with Applications to Predictive Modeling and GWAS of Plant Varieties , 2017, 1710.01788.

[20]  Gurmukh S Johal,et al.  Loss of an MDR Transporter in Compact Stalks of Maize br2 and Sorghum dw3 Mutants , 2003, Science.

[21]  Rich Caruana,et al.  Multitask Learning , 1998, Encyclopedia of Machine Learning and Data Mining.

[22]  Ming Yu,et al.  Statistical Inference for Pairwise Graphical Models Using Score Matching , 2016, NIPS.

[23]  Eric P. Xing,et al.  Tree-Guided Group Lasso for Multi-Task Regression with Structured Sparsity , 2009, ICML.

[24]  Eric C. Chi,et al.  Splitting Methods for Convex Clustering , 2013, Journal of computational and graphical statistics : a joint publication of American Statistical Association, Institute of Mathematical Statistics, Interface Foundation of North America.

[25]  Concha Bielza,et al.  A survey on multi‐output regression , 2015, WIREs Data Mining Knowl. Discov..

[26]  P. L. Combettes,et al.  A proximal decomposition method for solving convex variational inverse problems , 2008, 0807.2617.

[27]  Ming Yu,et al.  An Influence-Receptivity Model for Topic Based Information Cascades , 2017, 2017 IEEE International Conference on Data Mining (ICDM).

[28]  Kristen Grauman,et al.  Learning with Whom to Share in Multi-task Feature Learning , 2011, ICML.

[29]  M. Kolar,et al.  Recovery of simultaneous low rank and two-way sparse coefficient matrices, a nonconvex approach , 2018, Electronic Journal of Statistics.

[30]  Dean P. Foster,et al.  Fast Ridge Regression with Randomized Principal Component Analysis and Gradient Descent , 2014, UAI.

[31]  Ryan F. McCormick,et al.  Sorghum Dw2 Encodes a Protein Kinase Regulator of Stem Internode Length , 2017, Scientific Reports.

[32]  J. Mullet,et al.  Identification of Dw1, a Regulator of Sorghum Stem Internode Length , 2016, PloS one.