Query Selection via Weighted Entropy in Graph-Based Semi-supervised Classification

There has recently been a large effort in using unlabeled data in conjunction with labeled data in machine learning. Semi-supervised learning and active learning are two well-known techniques that exploit the unlabeled data in the learning process. In this work, the active learning is used to query a label for an unlabeled data on top of a semi-supervised classifier. This work focuses on the query selection criterion. The proposed criterion selects the example for which the label change results in the largest pertubation of other examples' label. Experimental results show the effectiveness of the proposed query selection criterion in comparison to existing techniques.

[1]  Avrim Blum,et al.  Learning from Labeled and Unlabeled Data using Graph Mincuts , 2001, ICML.

[2]  Andrew McCallum,et al.  Toward Optimal Active Learning through Sampling Estimation of Error Reduction , 2001, ICML.

[3]  Bernhard Schölkopf,et al.  Introduction to Semi-Supervised Learning , 2006, Semi-Supervised Learning.

[4]  John D. Lafferty,et al.  Semi-supervised learning using randomized mincuts , 2004, ICML.

[5]  Zoubin Ghahramani,et al.  Combining active learning and semi-supervised learning using Gaussian fields and harmonic functions , 2003, ICML 2003.

[6]  Bernhard Schölkopf,et al.  Learning with Local and Global Consistency , 2003, NIPS.

[7]  Andrew McCallum,et al.  Employing EM and Pool-Based Active Learning for Text Classification , 1998, ICML.

[8]  Alexander Zien,et al.  Semi-Supervised Learning , 2006 .

[9]  Sebastian Thrun,et al.  Text Classification from Labeled and Unlabeled Documents using EM , 2000, Machine Learning.

[10]  William A. Gale,et al.  A sequential algorithm for training text classifiers , 1994, SIGIR '94.

[11]  Xiaojin Zhu,et al.  --1 CONTENTS , 2006 .

[12]  A. Winsor Sampling techniques. , 2000, Nursing times.

[13]  Zoubin Ghahramani,et al.  Semi-supervised learning : from Gaussian fields to Gaussian processes , 2003 .

[14]  Wei Chu,et al.  Relational Learning with Gaussian Processes , 2006, NIPS.

[15]  Pat Langley,et al.  Editorial: On Machine Learning , 1986, Machine Learning.

[16]  J. Lafferty,et al.  Combining active learning and semi-supervised learning using Gaussian fields and harmonic functions , 2003, ICML 2003.

[17]  Shumeet Baluja,et al.  Probabilistic Modeling for Face Orientation Discrimination: Learning from Labeled and Unlabeled Data , 1998, NIPS.

[18]  Tommi S. Jaakkola,et al.  Information Regularization with Partially Labeled Data , 2002, NIPS.

[19]  Burr Settles,et al.  Active Learning Literature Survey , 2009 .

[20]  Kamal Nigamyknigam,et al.  Employing Em in Pool-based Active Learning for Text Classiication , 1998 .

[21]  Janez Demsar,et al.  Statistical Comparisons of Classifiers over Multiple Data Sets , 2006, J. Mach. Learn. Res..

[22]  Neil D. Lawrence,et al.  Semi-supervised Learning via Gaussian Processes , 2004, NIPS.

[23]  Mikhail Belkin,et al.  Regularization and Semi-supervised Learning on Large Graphs , 2004, COLT.

[24]  Leon G. Higley,et al.  Forensic Entomology: An Introduction , 2009 .

[25]  Adrian Corduneanu,et al.  On Information Regularization , 2002, UAI.

[26]  Mikhail Belkin,et al.  Manifold Regularization: A Geometric Framework for Learning from Labeled and Unlabeled Examples , 2006, J. Mach. Learn. Res..

[27]  Craig A. Knoblock,et al.  Active + Semi-supervised Learning = Robust Multi-View Learning , 2002, ICML.

[28]  Lawrence Carin,et al.  Semi-Supervised Classification , 2004, Encyclopedia of Database Systems.