Multi-Task Feature Selection on Multiple Networks via Maximum Flows

We propose a new formulation of multi-task feature selection coupled with multiple network regularizers, and show that the problem can be exactly and efficiently solved by maximum flow algorithms. This method contributes to one of the central topics in data mining: How to exploit structural information in multivariate data analysis, which has numerous applications, such as gene regulatory and social network analysis. On simulated data, we show that the proposed method leads to higher accuracy in discovering causal features by solving multiple tasks simultaneously using networks over features. Moreover, we apply the method to multi-locus association mapping with Arabidopsis thaliana genotypes and flowering time phenotypes, and demonstrate its ability to recover more known phenotype-related genes than other state-of-the-art methods.

[1]  Hongliang Fei,et al.  Structured Feature Selection and Task Relationship Inference for Multi-task Learning , 2011, ICDM.

[2]  Robert Tibshirani,et al.  The Entire Regularization Path for the Support Vector Machine , 2004, J. Mach. Learn. Res..

[3]  Yoshinobu Kawahara,et al.  Efficient network-guided multi-locus association mapping with graph cuts , 2012, Bioinform..

[4]  Bjarni J. Vilhjálmsson,et al.  Genome-wide association study of 107 phenotypes in Arabidopsis thaliana inbred lines , 2010 .

[5]  Robert E. Tarjan,et al.  A Fast Parametric Maximum Flow Algorithm and Applications , 1989, SIAM J. Comput..

[6]  Hongzhe Li,et al.  VARIABLE SELECTION AND REGRESSION ANALYSIS FOR GRAPH-STRUCTURED COVARIATES WITH AN APPLICATION TO GENOMICS. , 2010, The annals of applied statistics.

[7]  Liang-Tien Chia,et al.  Laplacian Sparse Coding, Hypergraph Laplacian Sparse Coding, and Applications , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[8]  Junzhou Huang,et al.  Learning with structured sparsity , 2009, ICML '09.

[9]  D. Greig,et al.  Exact Maximum A Posteriori Estimation for Binary Images , 1989 .

[10]  Francis R. Bach,et al.  Structured sparsity-inducing norms through submodular functions , 2010, NIPS.

[11]  Hongzhe Li,et al.  In Response to Comment on "Network-constrained regularization and variable selection for analysis of genomic data" , 2008, Bioinform..

[12]  Seunghak Lee,et al.  Leveraging input and output structures for joint mapping of epistatic and marginal eQTLs , 2012, Bioinform..

[13]  Rong Jin,et al.  Exclusive Lasso for Multi-task Feature Selection , 2010, AISTATS.

[14]  Joy Bergelson,et al.  Linkage and Association Mapping of Arabidopsis thaliana Flowering Time in Nature , 2010, PLoS genetics.

[15]  Eric P. Xing,et al.  A multivariate regression approach to association analysis of a quantitative trait network , 2008, Bioinform..

[16]  Jieping Ye,et al.  Feature grouping and selection over an undirected graph , 2012, KDD.

[17]  Xihong Lin,et al.  Rare-variant association testing for sequencing data with the sequence kernel association test. , 2011, American journal of human genetics.

[18]  P. Donnelly,et al.  Genome-wide strategies for detecting multiple loci that influence complex diseases , 2005, Nature Genetics.

[19]  Noah Simon,et al.  A Sparse-Group Lasso , 2013 .

[20]  Svetha Venkatesh,et al.  Sparse Subspace Clustering via Group Sparse Coding , 2013, SDM.

[21]  Michael I. Jordan,et al.  Union support recovery in high-dimensional multivariate regression , 2008, 2008 46th Annual Allerton Conference on Communication, Control, and Computing.

[22]  Vladimir Kolmogorov,et al.  An experimental comparison of min-cut/max- flow algorithms for energy minimization in vision , 2001, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[23]  M. Yuan,et al.  Model selection and estimation in regression with grouped variables , 2006 .

[24]  Bernhard Schölkopf,et al.  Measuring Statistical Dependence with Hilbert-Schmidt Norms , 2005, ALT.

[25]  Xiaohui Chen,et al.  A Two-Graph Guided Multi-task Lasso Approach for eQTL Mapping , 2012, AISTATS.

[26]  B. Matthews Comparison of the predicted and observed secondary structure of T4 phage lysozyme. , 1975, Biochimica et biophysica acta.

[27]  D. Reich,et al.  Principal components analysis corrects for stratification in genome-wide association studies , 2006, Nature Genetics.

[28]  Michael Mitzenmacher,et al.  Detecting Novel Associations in Large Data Sets , 2011, Science.

[29]  Andrew V. Goldberg,et al.  A new approach to the maximum flow problem , 1986, STOC '86.

[30]  Qian Xu,et al.  Probabilistic Multi-Task Feature Selection , 2010, NIPS.

[31]  H. Zou,et al.  Regularization and variable selection via the elastic net , 2005 .

[32]  Jean-Philippe Vert,et al.  Group lasso with overlap and graph lasso , 2009, ICML '09.

[33]  Julien Mairal,et al.  Convex and Network Flow Optimization for Structured Sparsity , 2011, J. Mach. Learn. Res..

[34]  Jiawei Han,et al.  Towards feature selection in network , 2011, CIKM '11.

[35]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .