Learning with non-Standard Supervision

Machine learning has enjoyed astounding practical success in a wide range of applications in recent years—practical success that often hurries ahead of our theoretical understanding. The standard framework for machine learning theory assumes full supervision, that is, training data consists of correctly labeled i.i.d. examples from the same task that the learned classifier is supposed to be applied to. However, many practical applications successfully make use of the sheer abundance of data that is currently produced. Such data may not be labeled or may be collected from various sources. The focus of this thesis is to provide theoretical analysis of machine learning regimes where the learner is given such (possibly large amounts) of non-perfect training data. In particular, we investigate the benefits and limitations of learning with unlabeled data in semi-supervised learning and active learning as well as benefits and limitations of learning from data that has been generated by a task that is different from the target task (domain adaptation learning). For all three settings, we propose Probabilistic Lipschitzness to model the relatedness between the labels and the underlying domain space, and we discuss our suggested notion by comparing it to other common data assumptions.

[1]  E. Mammen,et al.  Smooth Discrimination Analysis , 1999 .

[2]  Andreas Christmann,et al.  Support vector machines , 2008, Data Mining and Knowledge Discovery Handbook.

[3]  Ronitt Rubinfeld,et al.  Testing Closeness of Discrete Distributions , 2010, JACM.

[4]  Michael Kearns,et al.  Efficient noise-tolerant learning from statistical queries , 1993, STOC.

[5]  Hal Daumé,et al.  Domain Adaptation for Machine Translation by Mining Unseen Words , 2011, ACL.

[6]  Tyler Lu,et al.  Impossibility Theorems for Domain Adaptation , 2010, AISTATS.

[7]  Koby Crammer,et al.  Analysis of Representations for Domain Adaptation , 2006, NIPS.

[8]  Yishay Mansour,et al.  Domain Adaptation: Learning Bounds and Algorithms , 2009, COLT.

[9]  M. Kawanabe,et al.  Direct importance estimation for covariate shift adaptation , 2008 .

[10]  Yuan Shi,et al.  Information-Theoretical Learning of Discriminative Clusters for Unsupervised Domain Adaptation , 2012, ICML.

[11]  John Langford,et al.  Agnostic Active Learning Without Constraints , 2010, NIPS.

[12]  Vladimir Vapnik,et al.  Chervonenkis: On the uniform convergence of relative frequencies of events to their probabilities , 1971 .

[13]  Shai Ben-David,et al.  Detecting Change in Data Streams , 2004, VLDB.

[14]  Leslie G. Valiant,et al.  A theory of the learnable , 1984, STOC '84.

[15]  Steve Hanneke,et al.  Activized Learning: Transforming Passive to Active with Improved Label Complexity , 2011, J. Mach. Learn. Res..

[16]  Sethuraman Panchanathan,et al.  Optimization-Based Domain Adaptation towards Person-Adaptive Classification Models , 2011, 2011 10th International Conference on Machine Learning and Applications and Workshops.

[17]  Micah Adler,et al.  Approximating Optimal Binary Decision Trees , 2008, APPROX-RANDOM.

[18]  Philip M. Long,et al.  A Theoretical Analysis of Query Selection for Collaborative Filtering , 2001, Machine Learning.

[19]  Dan Klein,et al.  Structure compilation: trading structure for features , 2008, ICML '08.

[20]  Sanjoy Dasgupta,et al.  Two faces of active learning , 2011, Theor. Comput. Sci..

[21]  Sanjoy Dasgupta,et al.  Analysis of a greedy active learning strategy , 2004, NIPS.

[22]  Mehryar Mohri,et al.  Domain Adaptation in Regression , 2011, ALT.

[23]  Sanjoy Dasgupta,et al.  Hierarchical sampling for active learning , 2008, ICML '08.

[24]  Ingo Steinwart,et al.  Fast Rates for Support Vector Machines , 2005, COLT.

[25]  Steve Hanneke,et al.  A bound on the label complexity of agnostic active learning , 2007, ICML '07.

[26]  Shai Ben-David,et al.  New England , 1894, Letters from America.

[27]  Shai Ben-David,et al.  Understanding Machine Learning: References , 2014 .

[28]  Bernhard Schölkopf,et al.  Correcting Sample Selection Bias by Unlabeled Data , 2006, NIPS.

[29]  Eyal Kushilevitz,et al.  Learning by distances , 1990, COLT '90.

[30]  Matti Kääriäinen,et al.  Active Learning in the Non-realizable Case , 2006, ALT.

[31]  Shai Ben-David,et al.  PLAL: Cluster-based active learning , 2013, COLT.

[32]  A. Tsybakov,et al.  Optimal aggregation of classifiers in statistical learning , 2003 .

[33]  Peter L. Bartlett,et al.  Neural Network Learning - Theoretical Foundations , 1999 .

[34]  Mehryar Mohri,et al.  Sample Selection Bias Correction Theory , 2008, ALT.

[35]  Shalev Ben-David,et al.  Learning a Classifier when the Labeling Is Known , 2011, ALT.

[36]  Yuan Shi,et al.  Geodesic flow kernel for unsupervised domain adaptation , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[37]  Liwei Wang,et al.  Smoothness, Disagreement Coefficient, and the Label Complexity of Agnostic Active Learning , 2011, J. Mach. Learn. Res..

[38]  Shai Ben-David,et al.  On the Hardness of Domain Adaptation and the Utility of Unlabeled Target Samples , 2012, ALT.

[39]  Vittorio Castelli,et al.  On the exponential value of labeled samples , 1995, Pattern Recognit. Lett..

[40]  Maria-Florina Balcan,et al.  Margin Based Active Learning , 2007, COLT.

[41]  Pramod Viswanath,et al.  Universal hypothesis testing in the learning-limited regime , 2010, 2010 IEEE International Symposium on Information Theory.

[42]  Shai Ben-David,et al.  Combinatorial Variability of Vapnik-chervonenkis Classes with Applications to Sample Compression Schemes , 1998, Discret. Appl. Math..

[43]  Shai Ben-David,et al.  Access to Unlabeled Data can Speed up Prediction Time , 2011, ICML.

[44]  Klaus-Robert Müller,et al.  Covariate Shift Adaptation by Importance Weighted Cross Validation , 2007, J. Mach. Learn. Res..

[45]  Alexander Zien,et al.  Semi-Supervised Classification by Low Density Separation , 2005, AISTATS.

[46]  Masashi Sugiyama,et al.  Direct Density Ratio Estimation for Large-scale Covariate Shift Adaptation , 2008, SDM.

[47]  David Haussler,et al.  Epsilon-nets and simplex range queries , 1986, SCG '86.

[48]  Shai Shalev-Shwartz,et al.  Efficient active learning of halfspaces: an aggressive approach , 2012, J. Mach. Learn. Res..

[49]  Vittorio Castelli,et al.  The relative value of labeled and unlabeled samples in pattern recognition with an unknown mixing parameter , 1996, IEEE Trans. Inf. Theory.

[50]  Rich Caruana,et al.  Model compression , 2006, KDD '06.

[51]  Yishay Mansour,et al.  Learning Bounds for Importance Weighting , 2010, NIPS.

[52]  Maria-Florina Balcan,et al.  A PAC-Style Model for Learning from Labeled and Unlabeled Data , 2005, COLT.

[53]  David Haussler,et al.  Learnability and the Vapnik-Chervonenkis dimension , 1989, JACM.

[54]  Sanjoy Dasgupta,et al.  Coarse sample complexity bounds for active learning , 2005, NIPS.

[55]  Koby Crammer,et al.  A theory of learning from different domains , 2010, Machine Learning.

[56]  P. Massart,et al.  Risk bounds for statistical learning , 2007, math/0702683.

[57]  Philippe Rigollet,et al.  Generalization Error Bounds in Semi-supervised Classification Under the Cluster Assumption , 2006, J. Mach. Learn. Res..

[58]  S. Ben-David,et al.  Combinatorial Variability of Vapnik-chervonenkis Classes with Applications to Sample Compression Schemes , 1998, Discrete Applied Mathematics.

[59]  K. Müller,et al.  Generalization Error Estimation under Covariate Shift , 2005 .

[60]  Lorenzo Bruzzone,et al.  A cluster-assumption based batch mode active learning technique , 2012, Pattern Recognit. Lett..

[61]  Sanjoy Dasgupta,et al.  A General Agnostic Active Learning Algorithm , 2007, ISAIM.

[62]  Shai Ben-David,et al.  Does Unlabeled Data Provably Help? Worst-case Analysis of the Sample Complexity of Semi-Supervised Learning , 2008, COLT.

[63]  John Langford,et al.  Importance weighted active learning , 2008, ICML '09.

[64]  John Blitzer,et al.  Co-Training for Domain Adaptation , 2011, NIPS.

[65]  Sanjoy Dasgupta,et al.  Which Spatial Partition Trees are Adaptive to Intrinsic Dimension? , 2009, UAI.

[66]  Robert D. Nowak,et al.  Unlabeled data: Now it helps, now it doesn't , 2008, NIPS.

[67]  Matti Kääriäinen,et al.  Generalization Error Bounds Using Unlabeled Data , 2005, COLT.

[68]  Ruth Urner,et al.  Domain adaptation–can quantity compensate for quality? , 2013, Annals of Mathematics and Artificial Intelligence.

[69]  Burr Settles,et al.  Active Learning Literature Survey , 2009 .