A Transfer Learning Approach for Predictive Modeling of Degenerate Biological Systems

Modeling of a new domain can be challenging due to scarce data and high-dimensionality. Transfer learning aims to integrate data of the new domain with knowledge about some related old domains, to model the new domain better. This article studies transfer learning for degenerate biological systems. Degeneracy refers to the phenomenon that structurally different elements of the system perform the same/similar function or yield the same/similar output. Degeneracy exists in various biological systems and contributes to the heterogeneity, complexity, and robustness of the systems. Modeling of degenerate biological systems is challenging and models enabling transfer learning in such systems have been little studied. In this article, we propose a predictive model that integrates transfer learning and degeneracy under a Bayesian framework. Theoretical properties of the proposed model are studied. Finally, we present an application of modeling the predictive relationship between transcription factors and gene expression across multiple cell lines. The model achieves good prediction accuracy, and identifies known and possibly new degenerate mechanisms of the system. Supplementary materials for this article are available online.

[1]  Joseph G. Ibrahim,et al.  Bayesian Variable Selection , 2000 .

[2]  S. N. Lahiri,et al.  Asymptotic properties of the residual bootstrap for Lasso estimators , 2010 .

[3]  Fan Chung,et al.  Spectral Graph Theory , 1996 .

[4]  N. Zhang,et al.  Bayesian Variable Selection in Structured High-Dimensional Covariate Spaces With Applications in Genomics , 2010 .

[5]  G. Edelman,et al.  Degeneracy and complexity in biological systems , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[6]  Jieping Ye,et al.  Multi-Task Feature Learning Via Efficient l2, 1-Norm Minimization , 2009, UAI.

[7]  Xuejing Li,et al.  Learning “graph-mer” Motifs that Predict Gene Expression Trajectories in Development , 2010, PLoS Comput. Biol..

[8]  Jing Li,et al.  A Sparse Structure Learning Algorithm for Gaussian Bayesian Network Identification from High-Dimensional Data , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[9]  Qiang Yang,et al.  Boosting for transfer learning , 2007, ICML '07.

[10]  Massimiliano Pontil,et al.  Regularized multi--task learning , 2004, KDD.

[11]  David Madigan,et al.  Large-Scale Bayesian Logistic Regression for Text Categorization , 2007, Technometrics.

[12]  Edwin V. Bonilla,et al.  Multi-task Gaussian Process Prediction , 2007, NIPS.

[13]  Neil D. Lawrence,et al.  Learning to learn with the informative vector machine , 2004, ICML.

[14]  Tony Jebara,et al.  Multi-task feature and kernel selection for SVMs , 2004, ICML.

[15]  J. Idier Bayesian Approach to Inverse Problems: Idier/Bayesian , 2010 .

[16]  Dit-Yan Yeung,et al.  A Convex Formulation for Learning Task Relationships in Multi-Task Learning , 2010, UAI.

[17]  Stefan Kramer,et al.  Kernel-Based Inductive Transfer , 2008, ECML/PKDD.

[18]  Charles A. Micchelli,et al.  A Spectral Regularization Framework for Multi-Task Structure Learning , 2007, NIPS.

[19]  Massimiliano Pontil,et al.  Multi-Task Feature Learning , 2006, NIPS.

[20]  K. Kinzler,et al.  Cancer genes and the pathways they control , 2004, Nature Medicine.

[21]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[22]  Oleg Okun,et al.  Bayesian Variable Selection , 2014 .

[23]  Rich Caruana,et al.  Multitask Learning , 1998, Encyclopedia of Machine Learning and Data Mining.

[24]  Wenjiang J. Fu,et al.  Asymptotics for lasso-type estimators , 2000 .