A geometric framework for transfer learning using manifold alignment

Many machine learning problems involve dealing with a large amount of high-dimensional data across diverse domains. In addition, annotating or labeling the data is expensive as it involves significant human effort. This dissertation explores a joint solution to both these problems by exploiting the property that high-dimensional data in real-world application domains often lies on a lower-dimensional structure, whose geometry can be modeled as a graph or manifold. In particular, we propose a set of novel manifold-alignment based approaches for transfer learning. The proposed approaches transfer knowledge across different domains by finding low-dimensional embeddings of the datasets to a common latent space, which simultaneously match corresponding instances while preserving local or global geometry of each input dataset. We develop a novel two-step transfer learning method called Procrustes alignment. Procrustes alignment first maps the datasets to low-dimensional latent spaces reflecting their intrinsic geometries and then removes the translational, rotational and scaling components from one set so that the optimal alignment between the two sets can be achieved. This approach can preserve either global geometry or local geometry depending on the dimensionality reduction approach used in the first step. We propose a general one-step manifold alignment framework called manifold projections that can find alignments, both across instances as well as across features, while preserving local domain geometry. We develop and mathematically analyze several extensions of this framework to more challenging situations, including (1) when no correspondences across domains are given; (2) when the global geometry of each input domain needs to be respected; (3) when label information rather than correspondence information is available. A final contribution of this thesis is the study of multiscale methods for manifold alignment. Multiscale alignment automatically generates alignment results at different levels by discovering the shared intrinsic multilevel structures of the given datasets, providing a common representation across all input datasets.

[1]  H. Hotelling Relations Between Two Sets of Variates , 1936 .

[2]  Ronald N. Bracewell,et al.  The Fourier Transform and Its Applications , 1966 .

[3]  Richard A. Harshman,et al.  Indexing by Latent Semantic Analysis , 1990, J. Am. Soc. Inf. Sci..

[4]  V. N. Bogaevski,et al.  Matrix Perturbation Theory , 1991 .

[5]  M. Turk,et al.  Eigenfaces for Recognition , 1991, Journal of Cognitive Neuroscience.

[6]  Ingrid Daubechies,et al.  Ten Lectures on Wavelets , 1992 .

[7]  C. H. Chen,et al.  Handbook of Pattern Recognition and Computer Vision , 1993 .

[8]  Avrim Blum,et al.  The Bottleneck , 2021, Monopsony Capitalism.

[9]  Hans Knutsson,et al.  Learning multidimensional signal processing , 1998, Proceedings. Fourteenth International Conference on Pattern Recognition (Cat. No.98EX170).

[10]  S. Mallat A wavelet tour of signal processing , 1998 .

[11]  Edwin R. Hancock,et al.  Feature matching with Procrustes alignment and graph editing , 1999 .

[12]  David Haussler,et al.  Convolution kernels on discrete structures , 1999 .

[13]  David G. Lowe,et al.  Object recognition from local scale-invariant features , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[14]  J. Tenenbaum,et al.  A global geometric framework for nonlinear dimensionality reduction. , 2000, Science.

[15]  T. N. Bhat,et al.  The Protein Data Bank , 2000, Nucleic Acids Res..

[16]  S T Roweis,et al.  Nonlinear dimensionality reduction by locally linear embedding. , 2000, Science.

[17]  Nello Cristianini,et al.  Classification using String Kernels , 2000 .

[18]  Michael I. Jordan,et al.  On Spectral Clustering: Analysis and an algorithm , 2001, NIPS.

[19]  Yee Whye Teh,et al.  Automatic Alignment of Local Representations , 2002, NIPS.

[20]  V. Kostrykin,et al.  ON A SUBSPACE PERTURBATION PROBLEM , 2002, math/0203240.

[21]  Thomas L. Griffiths,et al.  Hierarchical Topic Models and the Nested Chinese Restaurant Process , 2003, NIPS.

[22]  David A. Forsyth,et al.  Matching Words and Pictures , 2003, J. Mach. Learn. Res..

[23]  Xiaofei He,et al.  Locality Preserving Projections , 2003, NIPS.

[24]  Mikhail Belkin,et al.  Laplacian Eigenmaps for Dimensionality Reduction and Data Representation , 2003, Neural Computation.

[25]  Anthony Widjaja,et al.  Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond , 2003, IEEE Transactions on Neural Networks.

[26]  Nikos A. Vlassis,et al.  Non-linear CCA and PCA by Alignment of Local Models , 2003, NIPS.

[27]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[28]  Richard B. Lehoucq,et al.  An Automated Multilevel Substructuring Method for Eigenspace Computation in Linear Elastodynamics , 2004, SIAM J. Sci. Comput..

[29]  Edwin R. Hancock,et al.  Graph matching using spectral embedding and alignment , 2004, Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004..

[30]  Daniel D. Lee,et al.  Semisupervised alignment of manifolds , 2005, AISTATS.

[31]  Chang Wang,et al.  New kernels for protein structural motif discovery and function classification , 2005, ICML.

[32]  Philipp Koehn,et al.  Europarl: A Parallel Corpus for Statistical Machine Translation , 2005, MTSUMMIT.

[33]  Sridhar Mahadevan,et al.  Value Function Approximation with Diffusion Wavelets and Laplacian Eigenfunctions , 2005, NIPS.

[34]  F. Chung Laplacians and the Cheeger Inequality for Directed Graphs , 2005 .

[35]  Mikhail Belkin,et al.  Manifold Regularization: A Geometric Framework for Learning from Labeled and Unlabeled Examples , 2006, J. Mach. Learn. Res..

[36]  Gunnar Rätsch,et al.  Graph Based Semi-supervised Learning with Sharper Edges , 2006, ECML.

[37]  Daniel Marcu,et al.  Domain Adaptation for Statistical Classifiers , 2006, J. Artif. Intell. Res..

[38]  Sridhar Mahadevan,et al.  Fast direct policy evaluation using multiscale analysis of Markov diffusion processes , 2006, ICML.

[39]  John Blitzer,et al.  Domain Adaptation with Structural Correspondence Learning , 2006, EMNLP.

[40]  L. Hogben Handbook of Linear Algebra , 2006 .

[41]  Michael I. Jordan,et al.  Hierarchical Dirichlet Processes , 2006 .

[42]  Ronald R. Coifman,et al.  Data Fusion and Multicue Data Matching by Diffusion Maps , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[43]  Jude W. Shavlik,et al.  Skill Acquisition Via Transfer Learning and Advice Taking , 2006, ECML.

[44]  Deli Zhao,et al.  Linear Laplacian Discrimination for Feature Extraction , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[45]  Nick Craswell,et al.  Random walks on the click graph , 2007, SIGIR.

[46]  Jiawei Han,et al.  Semi-supervised Discriminant Analysis , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[47]  Peter Stone,et al.  Towards reinforcement learning representation transfer , 2007, AAMAS '07.

[48]  Jiawei Han,et al.  Isometric Projection , 2007, AAAI.

[49]  Fernando Diaz,et al.  Pseudo-Aligned Multilingual Corpora , 2007, IJCAI.

[50]  Rajat Raina,et al.  Self-taught learning: transfer learning from unlabeled data , 2007, ICML '07.

[51]  Fei Wang,et al.  Semi-definite Manifold Alignment , 2007, ECML.

[52]  Tao Qin,et al.  LETOR: Benchmark Dataset for Research on Learning to Rank for Information Retrieval , 2007 .

[53]  Svetha Venkatesh,et al.  Robust learning of discriminative projection for multicategory classification on the Stiefel manifold , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[54]  Yishay Mansour,et al.  Domain Adaptation with Multiple Sources , 2008, NIPS.

[55]  Sridhar Mahadevan,et al.  Manifold alignment using Procrustes analysis , 2008, ICML '08.

[56]  Qiang Yang,et al.  Translated Learning: Transfer Learning across Different Feature Spaces , 2008, NIPS.

[57]  Marwan Mattar,et al.  Labeled Faces in the Wild: A Database forStudying Face Recognition in Unconstrained Environments , 2008 .

[58]  Transfer in Reinforcement Learning via Markov Logic Networks , 2008 .

[59]  Roberto Paredes,et al.  Simultaneous learning of a discriminative projection and prototypes for Nearest-Neighbor classification , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[60]  Sridhar Mahadevan,et al.  Multiscale analysis of document corpora based on diffusion models , 2009, IJCAI 2009.

[61]  Raymond J. Mooney,et al.  Transfer Learning from Minimal Target Data by Mapping across Relational Domains , 2009, IJCAI.

[62]  S. Mahadevan,et al.  Manifold Alignment without Correspondence , 2009, IJCAI.

[63]  Ivor W. Tsang,et al.  Domain adaptation from multiple sources via auxiliary classifiers , 2009, ICML '09.

[64]  Emine Yilmaz,et al.  Document selection methodologies for efficient and effective learning-to-rank , 2009, SIGIR.

[65]  Qiang Yang,et al.  A Survey on Transfer Learning , 2010, IEEE Transactions on Knowledge and Data Engineering.

[66]  U. Feige,et al.  Spectral Graph Theory , 2015 .