Bayesian Transfer Learning: An Overview of Probabilistic Graphical Models for Transfer Learning

Transfer learning where the behavior of extracting transferable knowledge from the source domain(s) and reusing this knowledge to target domain has become a research area of great interest in the field of artificial intelligence. Probabilistic graphical models (PGMs) have been recognized as a powerful tool for modeling complex systems with many advantages, e.g., the ability to handle uncertainty and possessing good interpretability. Considering the success of these two aforementioned research areas, it seems natural to apply PGMs to transfer learning. However, although there are already some excellent PGMs specific to transfer learning in the literature, the potential of PGMs for this problem is still grossly underestimated. This paper aims to boost the development of PGMs for transfer learning by 1) examining the pilot studies on PGMs specific to transfer learning, i.e., analyzing and summarizing the existing mechanisms particularly designed for knowledge transfer; 2) discussing examples of real-world transfer problems where existing PGMs have been successfully applied; and 3) exploring several potential research directions on transfer learning using PGM.

[1]  John Hutchins,et al.  Machine translation: a concise history , 2006 .

[2]  Joshua B. Tenenbaum,et al.  One-Shot Learning with a Hierarchical Nonparametric Bayesian Model , 2011, ICML Unsupervised and Transfer Learning.

[3]  Ling Shao,et al.  Transfer Learning for Visual Categorization: A Survey , 2015, IEEE Transactions on Neural Networks and Learning Systems.

[4]  Jianming Zhang,et al.  Attribute-based knowledge transfer learning for human pose estimation , 2013, Neurocomputing.

[5]  Nir Friedman,et al.  Probabilistic Graphical Models - Principles and Techniques , 2009 .

[6]  Jaakko Peltonen,et al.  Transfer learning using a nonparametric sparse topic model , 2013, Neurocomputing.

[7]  Nitish Srivastava,et al.  Discriminative Transfer Learning with Tree-based Priors , 2013, NIPS.

[8]  Shuang-Hong Yang,et al.  Bridging the Language Gap: Topic Adaptation for Documents with Different Technicality , 2011, AISTATS.

[9]  Vladimir Eidelman,et al.  Polylingual Tree-Based Topic Models for Translation Domain Adaptation , 2014, ACL.

[10]  Shou-De Lin,et al.  A Transfer Probabilistic Collective Factorization Model to Handle Sparse Data in Collaborative Filtering , 2014, 2014 IEEE International Conference on Data Mining.

[11]  Michael I. Jordan,et al.  On Discriminative vs. Generative Classifiers: A comparison of logistic regression and naive Bayes , 2001, NIPS.

[12]  Pei Yang,et al.  Bayesian Task-Level Transfer Learning for Non-linear Regression , 2008, 2008 International Conference on Computer Science and Software Engineering.

[13]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[14]  I JordanMichael,et al.  Graphical Models, Exponential Families, and Variational Inference , 2008 .

[15]  Jen-Tzung Chien,et al.  A new topic-bridged model for transfer learning , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[16]  Haizhou Li,et al.  A cross-domain adaptation method for sentiment classification using probabilistic latent analysis , 2011, CIKM '11.

[17]  Mehmet Gönen,et al.  Kernelized Bayesian Transfer Learning , 2014, AAAI.

[18]  Bei Yu,et al.  A cross-collection mixture model for comparative text mining , 2004, KDD.

[19]  Thomas L. Griffiths,et al.  The Indian Buffet Process: An Introduction and Review , 2011, J. Mach. Learn. Res..

[20]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[21]  Jean Ponce,et al.  Computer Vision: A Modern Approach , 2002 .

[22]  Linda Reichwein Zientek,et al.  Book Review: Exploratory and Confirmatory Factor Analysis: Understanding Concepts and Applications , 2007 .

[23]  Leslie Pack Kaelbling,et al.  Efficient Bayesian Task-Level Transfer Learning , 2007, IJCAI.

[24]  Sethuraman Panchanathan,et al.  Transfer of multimodal emotion features in deep belief networks , 2016, 2016 50th Asilomar Conference on Signals, Systems and Computers.

[25]  Yee Whye Teh,et al.  A Hierarchical Bayesian Language Model Based On Pitman-Yor Processes , 2006, ACL.

[26]  Bing Liu,et al.  Topic Modeling using Topics from Many Domains, Lifelong Learning and Big Data , 2014, ICML.

[27]  Ruslan Salakhutdinov,et al.  Probabilistic Matrix Factorization , 2007, NIPS.

[28]  Alan L. Yuille,et al.  A Bayesian Theory of Sequential Causal Learning and Abstract Transfer , 2016, Cogn. Sci..

[29]  Christoph H. Lampert,et al.  Learning to detect unseen object classes by between-class attribute transfer , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[30]  Masahiro Suzuki,et al.  Transfer learning based on the observation probability of each attribute , 2014, 2014 IEEE International Conference on Systems, Man, and Cybernetics (SMC).

[31]  Chong Wang,et al.  Nested Hierarchical Dirichlet Processes , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[32]  Ann L. Brown,et al.  Preschool children can learn to transfer: Learning to learn and learning from example , 1988, Cognitive Psychology.

[33]  J. Pitman,et al.  The two-parameter Poisson-Dirichlet distribution derived from a stable subordinator , 1997 .

[34]  Hans-Peter Kriegel,et al.  Supervised probabilistic principal component analysis , 2006, KDD '06.

[35]  David M. Blei,et al.  Probabilistic topic models , 2012, Commun. ACM.

[36]  Qiang Yang,et al.  Transfer learning for collaborative filtering via a rating-matrix generative model , 2009, ICML '09.

[37]  James H. Elder,et al.  Probabilistic Linear Discriminant Analysis for Inferences About Identity , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[38]  Hui Xiong,et al.  Collaborative Dual-PLSA: mining distinction and commonality across multiple domains for text classification , 2010, CIKM.

[39]  Radford M. Neal Probabilistic Inference Using Markov Chain Monte Carlo Methods , 2011 .

[40]  Victor Cheng,et al.  Classification Probabilistic PCA with Application in Domain Adaptation , 2011, PAKDD.

[41]  Richard A. Harshman,et al.  Indexing by Latent Semantic Analysis , 1990, J. Am. Soc. Inf. Sci..

[42]  Samuel J. Gershman,et al.  A Tutorial on Bayesian Nonparametric Models , 2011, 1106.2697.

[43]  Ali Farhadi,et al.  Describing objects by their attributes , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[44]  Thomas Fang Zheng,et al.  Transfer learning for speech and language processing , 2015, 2015 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA).

[45]  Peter Stone,et al.  Transfer Learning for Reinforcement Learning Domains: A Survey , 2009, J. Mach. Learn. Res..

[46]  Christopher Joseph Pal,et al.  Heterogeneous Transfer Learning with RBMs , 2011, AAAI.

[47]  Eduardo Lleida,et al.  Bayesian adaptation of PLDA based speaker recognition to domains with scarce development data , 2012, Odyssey.

[48]  David M. Blei,et al.  Variational Inference: A Review for Statisticians , 2016, ArXiv.

[49]  Alan Fern,et al.  Transfer Learning in Sequential Decision Problems: A Hierarchical Bayesian Approach , 2012, ICML Unsupervised and Transfer Learning.

[50]  Bin Li,et al.  Cross-Domain Collaborative Filtering: A Brief Survey , 2011, 2011 IEEE 23rd International Conference on Tools with Artificial Intelligence.

[51]  Michael I. Jordan,et al.  Hierarchical Dirichlet Processes , 2006 .

[52]  Hui Xiong,et al.  Mining Distinction and Commonality across Multiple Domains Using Generative Model for Text Classification , 2012, IEEE Transactions on Knowledge and Data Engineering.

[53]  Thomas Hofmann,et al.  Probabilistic Latent Semantic Analysis , 1999, UAI.

[54]  Ramesh Nallapati,et al.  Blind Domain Transfer for Named Entity Recognition using Generative Latent Topic Models , 2010 .

[55]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[56]  Xiaodong Yu,et al.  Attribute-Based Transfer Learning for Object Categorization with Zero/One Training Example , 2010, ECCV.

[57]  Jin Tian,et al.  Graphical Models for Inference with Missing Data , 2013, NIPS.

[58]  Thomas J. Watson,et al.  An empirical study of the naive Bayes classifier , 2001 .

[59]  Thomas L. Griffiths,et al.  Modeling Transfer Learning in Human Categorization with the Hierarchical Dirichlet Process , 2010, ICML.

[60]  Tom Minka,et al.  Expectation Propagation for approximate Bayesian inference , 2001, UAI.

[61]  Christopher D. Manning,et al.  Hierarchical Bayesian Domain Adaptation , 2009, NAACL.

[62]  Jun Guo,et al.  Improving Cross-Domain Recommendation through Probabilistic Cluster-Level Latent Factor Model , 2014, AAAI.

[63]  Fuzhen Zhuang,et al.  Concept Learning for Cross-Domain Text Classification: A General Probabilistic Framework , 2013, IJCAI.

[64]  Qiang Yang,et al.  Transferring topical knowledge from auxiliary long texts for short text clustering , 2011, CIKM '11.

[65]  Pietro Perona,et al.  A Bayesian hierarchical model for learning natural scene categories , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[66]  Ramesh Nallapati,et al.  Exploiting Feature Hierarchy for Transfer Learning in Named Entity Recognition , 2008, ACL.

[67]  Lin Li,et al.  A transfer learning method for PLDA-based speaker verification , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[68]  Diane J. Cook,et al.  Transfer learning for activity recognition: a survey , 2013, Knowledge and Information Systems.

[69]  Luo Si,et al.  Flexible Mixture Model for Collaborative Filtering , 2003, ICML.

[70]  Mark A. Paskin,et al.  Junction tree algorithms for solving sparse linear systems , 2003 .

[71]  Hiroshi Ogura,et al.  Gamma-Poisson Distribution Model for Text Categorization , 2013 .

[72]  Thomas L. Griffiths,et al.  The nested chinese restaurant process and bayesian nonparametric inference of topic hierarchies , 2007, JACM.

[73]  Yee Whye Teh,et al.  Bayesian Nonparametric Models , 2010, Encyclopedia of Machine Learning.

[74]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[75]  Tamara Sumner,et al.  Bayesian Supervised Domain Adaptation for Short Text Similarity , 2016, NAACL.

[76]  Rama Chellappa,et al.  Visual Domain Adaptation: A survey of recent advances , 2015, IEEE Signal Processing Magazine.

[77]  Bengt Muthén,et al.  A Structural Probit Model with Latent Variables , 1979 .

[78]  Koh Takeuchi,et al.  Cross-domain recommendation without shared users or items by sharing latent vector distributions , 2015, AISTATS.

[79]  Nir Friedman,et al.  Inferring Cellular Networks Using Probabilistic Graphical Models , 2004, Science.

[80]  Qiang Yang,et al.  Topic-bridged PLSA for cross-domain text classification , 2008, SIGIR '08.

[81]  Yee Whye Teh,et al.  A Hierarchical Nonparametric Bayesian Approach to Statistical Language Model Domain Adaptation , 2009, AISTATS.

[82]  ZhangGuangquan,et al.  Transfer learning using computational intelligence , 2015 .

[83]  Jimeng Sun,et al.  Cross-domain collaboration recommendation , 2012, KDD.

[84]  Paul Smolensky,et al.  Information processing in dynamical systems: foundations of harmony theory , 1986 .

[85]  Bin Cao,et al.  Multi-Domain Collaborative Filtering , 2010, UAI.

[86]  Yanjun Qi,et al.  Style Transfer Generative Adversarial Networks: Learning to Play Chess Differently , 2017, ArXiv.

[87]  Lawrence Carin,et al.  Cross-Domain Multitask Learning with Latent Probit Models , 2012, ICML.

[88]  Michael I. Jordan,et al.  Hierarchical Beta Processes and the Indian Buffet Process , 2007, AISTATS.

[89]  Qiang Yang,et al.  A Survey on Transfer Learning , 2010, IEEE Transactions on Knowledge and Data Engineering.

[90]  Susumu Horiguchi,et al.  A Hidden Topic-Based Framework toward Building Applications with Short Web Documents , 2011, IEEE Transactions on Knowledge and Data Engineering.

[91]  Thomas L. Griffiths,et al.  The Author-Topic Model for Authors and Documents , 2004, UAI.

[92]  Jun Ma,et al.  Transfer Topic Modeling with Ease and Scalability , 2012, SDM.

[93]  Christoph H. Lampert,et al.  Attribute-Based Classification for Zero-Shot Visual Object Categorization , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[94]  Yoshua Bengio,et al.  Deep Learning of Representations for Unsupervised and Transfer Learning , 2011, ICML Unsupervised and Transfer Learning.

[95]  Svetha Venkatesh,et al.  A Bayesian Nonparametric Joint Factor Model for Learning Shared and Individual Subspaces from Multiple Data Sources , 2012, SDM.

[96]  Seungjin Choi,et al.  Probabilistic matrix tri-factorization , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[97]  Junwei Wang,et al.  ComSoc: adaptive transfer of user behaviors over composite social network , 2012, KDD.

[98]  Pietro Perona,et al.  One-shot learning of object categories , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[99]  Mingsheng Long,et al.  Topic Correlation Analysis for Cross-Domain Text Classification , 2012, AAAI.

[100]  Michael I. Jordan,et al.  Graphical Models, Exponential Families, and Variational Inference , 2008, Found. Trends Mach. Learn..

[101]  Yee Whye Teh,et al.  A Fast Learning Algorithm for Deep Belief Nets , 2006, Neural Computation.