On semi-supervised kernel methods

Semi-supervised learning is an emerging computational paradigm for learning from limited supervision by utilizing large amounts of inexpensive, unsupervised observations. Not only does this paradigm carry appeal as a model for natural learning, but it also has an increasing practical need in most if not all applications of machine learning—those where abundant amounts of data can be cheaply and automatically collected but manual labeling for the purposes of training learning algorithms is often slow, expensive, and error-prone. In this thesis, we develop families of algorithms for semi-supervised inference. These algorithms are based on intuitions about the natural structure and geometry of probability distributions that underlie typical datasets for learning. The classical framework of Regularization in Reproducing Kernel Hilbert Spaces (which is the basis of state-of-the-art supervised algorithms such as SVMs) is extended in several ways to utilize unlabeled data. These extensions are embodied in the following contributions: (1)  Manifold Regularization is based on the assumption that high-dimensional data truly resides on low-dimensional manifolds. Ambient globally-defined kernels are combined with the intrinsic Laplacian regularizer to develop new kernels which immediately turn standard supervised kernel methods into semi-supervised learners. An outstanding problem of out-of-sample extension in graph transductive methods is resolved in this framework. (2) Low-density Methods bias learning so that data clusters are protected from being cut by decision boundaries at the expense of turning regularization objectives into non-convex functionals. We analyze the nature of this non-convexity and propose deterministic annealing techniques to overcome local minima. (3) The Co-regularization framework is applicable in settings where data appears in multiple redundant representations. Learners in each representation are biased to maximize consensus in their predictions through multi-view regularizers. (4) We develop l1 regularization and greedy matching pursuit algorithms for sparse non-linear manifold regularization . (5) We develop specialized linear algorithms for very large sparse data matrices, and apply it for probing utility of unlabeled documents for text classification. (6) Empirical results on a variety of semi-supervised learning tasks suggest that these algorithms obtain state-of-the-art performance.