Multi-Task Feature Interaction Learning

One major limitation of linear models is the lack of capability to capture predictive information from interactions between features. While introducing high-order feature interaction terms can overcome this limitation, this approach tremendously increases the model complexity and imposes significant challenges in the learning against overfitting. In this paper, we proposed a novel Multi-Task feature Interaction Learning~(MTIL) framework to exploit the task relatedness from high-order feature interactions, which provides better generalization performance by inductive transfer among tasks via shared representations of feature interactions. We formulate two concrete approaches under this framework and provide efficient algorithms: the shared interaction approach and the embedded interaction approach. The former assumes tasks share the same set of interactions, and the latter assumes feature interactions from multiple tasks come from a shared subspace. We have provided efficient algorithms for solving the two approaches. Extensive empirical studies on both synthetic and real datasets have demonstrated the effectiveness of the proposed framework.

[1]  Tom Heskes,et al.  Task Clustering and Gating for Bayesian Multitask Learning , 2003, J. Mach. Learn. Res..

[2]  Qian Xu,et al.  Probabilistic Multi-Task Feature Selection , 2010, NIPS.

[3]  Xiaohui Chen,et al.  A Two-Graph Guided Multi-task Lasso Approach for eQTL Mapping , 2012, AISTATS.

[4]  Massimiliano Pontil,et al.  Convex multi-task feature learning , 2008, Machine Learning.

[5]  Jieping Ye,et al.  Efficient Methods for Overlapping Group Lasso , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[6]  Jieping Ye,et al.  An Efficient Algorithm For Weak Hierarchical Lasso , 2016, TKDD.

[7]  P. Tseng Convergence of a Block Coordinate Descent Method for Nondifferentiable Minimization , 2001 .

[8]  Marc Teboulle,et al.  A Fast Iterative Shrinkage-Thresholding Algorithm for Linear Inverse Problems , 2009, SIAM J. Imaging Sci..

[9]  Stephen P. Boyd,et al.  Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers , 2011, Found. Trends Mach. Learn..

[10]  Mark John Somers,et al.  Organizational commitment, turnover and absenteeism: An examination of direct and interaction effects , 1995 .

[11]  R. Tibshirani,et al.  A LASSO FOR HIERARCHICAL INTERACTIONS. , 2012, Annals of statistics.

[12]  R. Cadoret,et al.  Genetic-environmental interaction in the genesis of aggressivity and conduct disorders. , 1995, Archives of general psychiatry.

[13]  R. Tibshirani,et al.  A note on the group lasso and a sparse group lasso , 2010, 1001.0736.

[14]  Jiayu Zhou,et al.  Modeling disease progression via multi-task learning , 2013, NeuroImage.

[15]  Ji Zhu,et al.  Variable Selection With the Strong Heredity Constraint and Its Oracle Property , 2010 .

[16]  Charu C. Aggarwal,et al.  Factorized Similarity Learning in Networks , 2014, 2014 IEEE International Conference on Data Mining.

[17]  Gareth M. James,et al.  Variable Selection Using Adaptive Nonlinear Interaction Structures in High Dimensions , 2010 .

[18]  K. Leow,et al.  The Organizational Commitment , 2011 .

[19]  Shai Ben-David,et al.  Exploiting Task Relatedness for Mulitple Task Learning , 2003, COLT.

[20]  Ryota Tomioka,et al.  Estimation of low-rank tensors via convex optimization , 2010, 1010.0789.

[21]  Ben Taskar,et al.  Joint covariate selection and joint subspace selection for multiple classification problems , 2010, Stat. Comput..

[22]  Rich Caruana,et al.  Multitask Learning , 1998, Encyclopedia of Machine Learning and Data Mining.

[23]  Asta Försti,et al.  Opinion: The balance between heritable and environmental aetiology of human disease , 2006, Nature Reviews Genetics.

[24]  R Plomin,et al.  Gene–environment interaction analysis of serotonin system markers with adolescent depression , 2004, Molecular Psychiatry.

[25]  Jiayu Zhou,et al.  Efficient multi-task feature learning with calibration , 2014, KDD.

[26]  Michael A. Saunders,et al.  Proximal Newton-type methods for convex optimization , 2012, NIPS.

[27]  Jianpeng Xu,et al.  ORION: Online Regularized Multi-task Regression and Its Application to Ensemble Forecasting , 2014, 2014 IEEE International Conference on Data Mining.

[28]  Massimiliano Pontil,et al.  Multilinear Multitask Learning , 2013, ICML.

[29]  Eric P. Xing,et al.  Tree-Guided Group Lasso for Multi-Task Regression with Structured Sparsity , 2009, ICML.

[30]  Massimiliano Pontil,et al.  Regularized multi--task learning , 2004, KDD.

[31]  B. Recht,et al.  Tensor completion and low-n-rank tensor recovery via convex optimization , 2011 .

[32]  Taiji Suzuki,et al.  Convex Tensor Decomposition via Structured Schatten Norm Regularization , 2013, NIPS.

[33]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[34]  Jieping Ye,et al.  A General Iterative Shrinkage and Thresholding Algorithm for Non-convex Regularized Optimization Problems , 2013, ICML.

[35]  Stephen P. Boyd,et al.  Proximal Algorithms , 2013, Found. Trends Optim..

[36]  Jieping Ye,et al.  An accelerated gradient method for trace norm minimization , 2009, ICML '09.

[37]  Stephen J. Wright,et al.  Sparse Reconstruction by Separable Approximation , 2008, IEEE Transactions on Signal Processing.

[38]  R A Bryant,et al.  Interactions between BDNF Val66Met polymorphism and early life stress predict brain and arousal pathways to syndromal depression and anxiety , 2009, Molecular Psychiatry.

[39]  Stephen J. Wright,et al.  Simultaneous Variable Selection , 2005, Technometrics.

[40]  P. Cochat,et al.  Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.

[41]  Jiayu Zhou,et al.  Clustered Multi-Task Learning Via Alternating Structure Optimization , 2011, NIPS.