A Survey of Transfer and Multitask Learning in Bioinformatics

Machine learning and data mining have found many applications in biological domains, where we look to build predictive models based on labeled training data. However, in practice, high quality labeled data is scarce, and to label new data incurs high costs. Transfer and multitask learning offer an attractive alternative, by allowing useful knowledge to be extracted and transferred from data in auxiliary domains helps counter the lack of data problem in the target domain. In this article, we survey recent advances in transfer and multitask learning for bioinformatics applications. In particular, we survey several key bioinformatics application areas, including sequence classification, gene expression data analysis, biological network reconstruction and biomedical applications. Category: Convergence computing

[1]  Qiang Yang,et al.  MegaSNPHunter: a learning approach to detect disease predisposition SNPs and high level interactions in genome wide association study , 2009, BMC Bioinformatics.

[2]  Rajat Raina,et al.  Self-taught learning: transfer learning from unlabeled data , 2007, ICML '07.

[3]  Eric P. Xing,et al.  Multi-population GWA mapping via multi-task regularized regression , 2010, Bioinform..

[4]  Murat Dundar,et al.  An Improved Multi-task Learning Approach with Applications in Medical Diagnosis , 2008, ECML/PKDD.

[5]  Ling Bao,et al.  Activity Recognition from User-Annotated Acceleration Data , 2004, Pervasive.

[6]  Qiang Yang,et al.  Multi-platform gene-expression mining and marker gene analysis , 2011, Int. J. Data Min. Bioinform..

[7]  Hwee Tou Ng,et al.  Domain adaptation for semantic role labeling in the biomedical domain , 2010, Bioinform..

[8]  Qiang Yang,et al.  Protein-protein interaction prediction via Collective Matrix Factorization , 2010, 2010 IEEE International Conference on Bioinformatics and Biomedicine (BIBM).

[9]  Qiang Yang,et al.  SNPHarvester: a filtering-based approach for detecting epistatic interactions in genome-wide association studies , 2009, Bioinform..

[10]  Zaher Dawy,et al.  A new multitask learning method for multiorganism gene network estimation , 2008, 2008 IEEE International Symposium on Information Theory.

[11]  BMC Bioinformatics , 2005 .

[12]  A. Valencia,et al.  Text-mining and information-retrieval services for molecular biology , 2005, Genome Biology.

[13]  Thomas Lengauer,et al.  Multi-task learning for HIV therapy screening , 2008, ICML '08.

[14]  Austin H. Chen,et al.  A New Multi-Task Learning Technique to Predict Classification of Leukemia and Prostate Cancer , 2010, ICMB.

[15]  Michael I. Jordan,et al.  Feature selection for high-dimensional genomic microarray data , 2001, ICML.

[16]  Qiang Yang,et al.  Boosting for transfer learning , 2007, ICML '07.

[17]  James A. Landay,et al.  Design requirements for technologies that encourage physical activity , 2006, CHI.

[18]  Concha Bielza,et al.  Machine Learning in Bioinformatics , 2008, Encyclopedia of Database Systems.

[19]  C. Ball,et al.  Identification of genes periodically expressed in the human cell cycle and their expression in tumors. , 2002, Molecular biology of the cell.

[20]  Home to Home Transfer Learning , 2010 .

[21]  Michael Ruogu Zhang,et al.  Comprehensive identification of cell cycle-regulated genes of the yeast Saccharomyces cerevisiae by microarray hybridization. , 1998, Molecular biology of the cell.

[22]  Masashi Sugiyama,et al.  A Transfer Learning Approach and Selective Integration of Multiple Types of Assays for Biological Network Inference , 2010, Int. J. Knowl. Discov. Bioinform..

[23]  Yoshihiro Yamanishi,et al.  Simultaneous inference of biological networks of multiple species from genome-wide data and evolutionary information: a semi-supervised approach , 2009, Bioinform..

[24]  Jean-Philippe Vert,et al.  Protein-ligand interaction prediction: an improved chemogenomics approach , 2008, Bioinform..

[25]  Gunnar Rätsch,et al.  An Empirical Analysis of Domain Adaptation Algorithms for Genomic Sequence Analysis , 2008, NIPS.

[26]  Rich Caruana,et al.  Multitask Learning: A Knowledge-Based Source of Inductive Bias , 1993, ICML.

[27]  ChengXiang Zhai,et al.  Instance Weighting for Domain Adaptation in NLP , 2007, ACL.

[28]  Uwe Hansmann,et al.  Pervasive Computing , 2003 .

[29]  Diane J. Cook,et al.  Multi Home Transfer Learning for Resident Activity Discovery and Recognition , 2010 .

[30]  Gunnar Rätsch,et al.  Leveraging Sequence Classification by Taxonomy-Based Multitask Learning , 2010, RECOMB.

[31]  Shuigeng Zhou,et al.  Gene ontology based transfer learning for protein subcellular localization , 2011, BMC Bioinformatics.

[32]  Xiao Li,et al.  Regularized Adaptation of Discriminative Classifiers , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[33]  Qiang Yang,et al.  High-Level Goal Recognition in a Wireless LAN , 2004, AAAI.

[34]  Massimiliano Pontil,et al.  Regularized multi--task learning , 2004, KDD.

[35]  Yangsheng Xu,et al.  Human Abnormal Gait Modeling via Hidden Markov Model , 2007, 2007 International Conference on Information Acquisition.

[36]  Hong Yan,et al.  Finding Correlated Biclusters from Gene Expression Data , 2011, IEEE Transactions on Knowledge and Data Engineering.

[37]  Qiang Yang,et al.  Indoor localization in multi-floor environments with reduced effort , 2010, 2010 IEEE International Conference on Pervasive Computing and Communications (PerCom).

[38]  Gunnar Rätsch,et al.  Novel Machine Learning Methods for MHC Class I Binding Prediction , 2010, PRIB.

[39]  Qiang Yang,et al.  A Survey on Transfer Learning , 2010, IEEE Transactions on Knowledge and Data Engineering.

[40]  Gunnar Rätsch,et al.  Inferring latent task structure for Multitask Learning by Multiple Kernel Learning , 2010, BMC Bioinformatics.

[41]  Jason Weston,et al.  Semi-supervised multi-task learning for predicting interactions between HIV-1 and human proteins , 2010, Bioinform..

[42]  Sangeeta Bhattacharya,et al.  Jog Falls: A Pervasive Healthcare Platform for Diabetes Management , 2010, Pervasive.

[43]  Hal Daumé,et al.  Frustratingly Easy Domain Adaptation , 2007, ACL.

[44]  Jean-Philippe Vert,et al.  Efficient peptide-MHC-I binding prediction for alleles with few known binders , 2008, Bioinform..

[45]  Philip C. Woodland Speaker adaptation for continuous density HMMs: a review , 2001 .

[46]  Timm Faulwasser,et al.  Towards pervasive computing in health care – A literature review , 2008, BMC Medical Informatics Decis. Mak..

[47]  Yiqiang Chen,et al.  Cross-mobile ELM based Activity Recognition , 2010 .

[48]  Andrew Y. Ng,et al.  Transfer learning for text classification , 2005, NIPS.

[49]  Qiang Yang,et al.  Multitask Learning for Protein Subcellular Location Prediction , 2011, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[50]  Massimiliano Pontil,et al.  Multi-Task Feature Learning , 2006, NIPS.

[51]  Satoru Miyano,et al.  Utilizing Evolutionary Information and Gene Expression Data for Estimating Gene Networks with Bayesian Network Models , 2005, J. Bioinform. Comput. Biol..

[52]  Jean-Philippe Vert,et al.  Clustered Multi-Task Learning: A Convex Formulation , 2008, NIPS.

[53]  Gwenn Englebienne,et al.  Recognizing Activities in Multiple Contexts using Transfer Learning , 2008, AAAI Fall Symposium: AI in Eldercare: New Solutions to Old Problems.

[54]  John Blitzer,et al.  Domain Adaptation with Structural Correspondence Learning , 2006, EMNLP.

[55]  Qiang Yang,et al.  Multi-task learning for cross-platform siRNA efficacy prediction: an in-silico study , 2010, BMC Bioinformatics.

[56]  Massimiliano Pontil,et al.  Convex multi-task feature learning , 2008, Machine Learning.

[57]  Yoshihiro Yamanishi,et al.  Prediction of drug–target interaction networks from the integration of chemical and genomic spaces , 2008, ISMB.

[58]  Diane J. Cook,et al.  Transferring Learned Activities in Smart Environments , 2009, Intelligent Environments.

[59]  Michael L. Littman,et al.  Activity Recognition from Accelerometer Data , 2005, AAAI.

[60]  서정연,et al.  Journal of Computing Science and Engineering(JCSE)의 국제화 작업 , 2010 .

[61]  Arne Elofsson,et al.  Prediction of MHC class I binding peptides, using SVMHC , 2002, BMC Bioinformatics.

[62]  Geoffrey J. Gordon,et al.  Relational learning via collective matrix factorization , 2008, KDD.

[63]  Soo-Young Lee,et al.  On-Line Handwritten Character Recognition with 3D Accelerometer , 2006, 2006 IEEE International Conference on Information Acquisition.

[64]  Jing Yang,et al.  Magic wand: a hand-drawn gesture input device in 3-D space with inertial sensors , 2004, Ninth International Workshop on Frontiers in Handwriting Recognition.

[65]  Qiang Yang,et al.  Cross-domain activity recognition , 2009, UbiComp.

[66]  Bahram Parvin,et al.  Sparse multitask regression for identifying common mechanism of response to therapeutic targets , 2010, Bioinform..