Inference Driven Metric Learning (IDML) for Graph Construction

Graph-based semi-supervised learning (SSL) methods usually consist of two stages: in the first stage, a graph is constructed from the set of input instances; and in the second stage, the available label information along with the constructed graph is used to assign labels to the unlabeled instances. Most of the previously proposed graph construction methods are unsupervised in nature, as they ignore the label information already present in the SSL setting in which they operate. In this paper, we explore how available labeled instances can be used to construct a better graph which is tailored to the current classification task. To achieve this goal, we evaluate effectiveness of various supervised metric learning algorithms during graph construction. Additionally, we propose a new metric learning framework: Inference Driven Metric Learning (IDML), which extends existing supervised metric learning algorithms to exploit widely available unlabeled data during the metric learning step itself. We provide extensive empirical evidence demonstrating that inference over graph constructed using IDML learned metric can lead to significant reduction in classification error, compared to inference over graphs constructed using existing techniques. Finally, we demonstrate how active learning can be successfully incorporated within the the IDML framework to reduce the amount of supervision necessary during graph construction. Comments University of Pennsylvania Department of Computer and Information Science Technical Report No. MSCIS-10-18. This technical report is available at ScholarlyCommons: http://repository.upenn.edu/cis_reports/927 Inference Driven Metric Learning (IDML) for

[1]  Shih-Fu Chang,et al.  Graph construction and b-matching for semi-supervised learning , 2009, ICML '09.

[2]  Daniel A. Spielman,et al.  Fitting a graph to vector data , 2009, ICML '09.

[3]  Fei Wang,et al.  Label Propagation through Linear Neighborhoods , 2006, IEEE Transactions on Knowledge and Data Engineering.

[4]  Mikhail Belkin,et al.  Manifold Regularization: A Geometric Framework for Learning from Labeled and Unlabeled Examples , 2006, J. Mach. Learn. Res..

[5]  Kilian Q. Weinberger,et al.  Distance Metric Learning for Large Margin Nearest Neighbor Classification , 2005, NIPS.

[6]  Heikki Mannila,et al.  Random projection in dimensionality reduction: applications to image and text data , 2001, KDD '01.

[7]  Rong Jin,et al.  Regularized Distance Metric Learning: Theory and Algorithm , 2009, NIPS.

[8]  Raymond J. Mooney,et al.  Integrating constraints and metric learning in semi-supervised clustering , 2004, ICML.

[9]  Koby Crammer,et al.  New Regularized Algorithms for Transductive Learning , 2009, ECML/PKDD.

[10]  Zoubin Ghahramani,et al.  Combining active learning and semi-supervised learning using Gaussian fields and harmonic functions , 2003, ICML 2003.

[11]  Bernhard Schölkopf,et al.  Learning with Local and Global Consistency , 2003, NIPS.

[12]  Shih-Fu Chang,et al.  Graph transduction via alternating minimization , 2008, ICML '08.

[13]  Alexander Zien,et al.  Semi-Supervised Learning , 2006 .

[14]  Koby Crammer,et al.  Multi-Class Confidence Weighted Algorithms , 2009, EMNLP.

[15]  Inderjit S. Dhillon,et al.  Information-theoretic metric learning , 2006, ICML '07.

[16]  Jeff A. Bilmes,et al.  Entropic Graph Regularization in Non-Parametric Semi-Supervised Classification , 2009, NIPS.

[17]  Alexander Zien,et al.  Label Propagation and Quadratic Criterion , 2006 .

[18]  Mikhail Belkin,et al.  Towards a theoretical foundation for Laplacian-based manifold methods , 2005, J. Comput. Syst. Sci..

[19]  Jeff A. Bilmes,et al.  The semi-supervised switchboard transcription project , 2009, INTERSPEECH.