Semi-Supervised Learning with Explicit Misclassification Modeling

This paper investigates a new approach for training discriminant classifiers when only a small set of labeled data is available together with a large set of unlabeled data. This algorithm optimizes the classification maximum likelihood of a set of labeled-unlabeled data, using a variant form of the Classification Expectation Maximization (CEM) algorithm. Its originality is that it makes use of both unlabeled data and of a probabilistic misclassification model for these data. The parameters of the label-error model are learned together with the classifier parameters. We demonstrate the effectiveness of the approach on four data-sets and show the advantages of this method over a previously developed semi-supervised algorithm which does not consider imperfections in the labeling process.

[1]  Dexter Whitlock,et al.  A parallel best-first search , 1988, CSC '88.

[2]  G. McLachlan,et al.  Updating a discriminant function in basis of unclassified data , 1982 .

[3]  Richard E. Korf,et al.  Single-Agent Parallel Window Search , 1991, IEEE Trans. Pattern Anal. Mach. Intell..

[4]  D. M. Titterington An alternative stochastic supervisor in discriminant analysis , 1989, Pattern Recognit..

[5]  Richard E. Korf,et al.  Distributed Tree Search and Its Application to Alpha-Beta Pruning , 1988, AAAI.

[6]  Richard E. Korf,et al.  Linear-Space Best-First Search , 1993, Artif. Intell..

[7]  Volker Roth,et al.  Nonlinear Discriminant Analysis Using Kernel Functions , 1999, NIPS.

[8]  J. Anderson Multivariate logistic compounds , 1979 .

[9]  Stefan Edelkamp,et al.  Localizing A* , 2000, AAAI/IAAI.

[10]  J. A. Anderson,et al.  7 Logistic discrimination , 1982, Classification, Pattern Recognition and Reduction of Dimensionality.

[11]  David J. Miller,et al.  A Mixture of Experts Classifier with Learning Based on Both Labelled and Unlabelled Data , 1996, NIPS.

[12]  Nils J. Nilsson,et al.  A Formal Basis for the Heuristic Determination of Minimum Cost Paths , 1968, IEEE Trans. Syst. Sci. Cybern..

[13]  D. Titterington,et al.  Estimation Problems with Data from a Mixture , 1978 .

[14]  Massih-Reza Amini,et al.  Semi Supervised Logistic Regression , 2002, ECAI.

[15]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[16]  Patrik Haslum,et al.  Admissible Heuristics for Optimal Planning , 2000, AIPS.

[17]  Amos Storkey,et al.  Advances in Neural Information Processing Systems 20 , 2007 .

[18]  Richard E. Korf,et al.  Parallel heuristic search: two approaches , 1990 .

[19]  Leslie Lamport,et al.  Model Checking TLA+ Specifications , 1999, CHARME.

[20]  Thorsten Joachims,et al.  Transductive Inference for Text Classification using Support Vector Machines , 1999, ICML.

[21]  Sebastian Thrun,et al.  Text Classification from Labeled and Unlabeled Documents using EM , 2000, Machine Learning.

[22]  C. B. Chittineni Learning with imperfectly labeled patterns , 1980, Pattern Recognit..

[23]  Robert Tibshirani,et al.  A Comparison of Some Error Estimates for Neural Network Models , 1996, Neural Computation.

[24]  Sebastian Thrun,et al.  ARA*: Anytime A* with Provable Bounds on Sub-Optimality , 2003, NIPS.

[25]  Philippas Tsigas,et al.  Fast and lock-free concurrent priority queues for multi-thread systems , 2005, J. Parallel Distributed Comput..

[26]  Frank van Harmelen,et al.  Proceedings of the 15th European Conference on Artificial Intelligence , 2002 .

[27]  Ira Pohl,et al.  Heuristic Search Viewed as Path Finding in a Graph , 1970, Artif. Intell..

[28]  Jin Wang,et al.  The Advantages of Using Depth and Breadth Components in Heuristic Search , 1988, ISMIS.

[29]  G. Celeux,et al.  A Classification EM algorithm for clustering and two stochastic versions , 1992 .

[30]  Richard E. Korf,et al.  Large-Scale Parallel Breadth-First Search , 2005, AAAI.

[31]  Avrim Blum,et al.  The Bottleneck , 2021, Monopsony Capitalism.

[32]  T. Krishnan Efficiency of learning with imperfect supervision , 1988, Pattern Recognit..

[33]  Craig A. Knoblock,et al.  Active + Semi-supervised Learning = Robust Multi-View Learning , 2002, ICML.

[34]  Francine Chen,et al.  A trainable document summarizer , 1995, SIGIR '95.

[35]  Richard E. Korf,et al.  Iterative-Deepening-A*: An Optimal Admissible Tree Search , 1985, IJCAI.

[36]  ThrunSebastian,et al.  Text Classification from Labeled and Unlabeled Documents using EM , 2000 .

[37]  Terence J. O'Neill Normal Discrimination with Unclassified Observations , 1978 .

[38]  Michele Banko,et al.  Generating Extraction-Based Summaries from Hand-Written Summaries by Aligning Text Spans , 1999 .

[39]  Vipin Kumar,et al.  Parallel Best-First Search of State-Space Graphs: A Summary of Results , 1988, AAAI.

[40]  Leslie Lamport,et al.  Specifying Systems: The TLA+ Language and Tools for Hardware and Software Engineers [Book Review] , 2002, Computer.

[41]  Michael J. Symons,et al.  Clustering criteria and multivariate normal mixtures , 1981 .

[42]  James A. Hendler,et al.  PRA*: Massively Parallel Heuristic Search , 1995, J. Parallel Distributed Comput..

[43]  Sarit Kraus,et al.  KBFS: K-Best-First Search , 2003, Annals of Mathematics and Artificial Intelligence.

[44]  Timothy L. Harris,et al.  A Pragmatic Implementation of Non-blocking Linked-Lists , 2001, DISC.

[45]  Geoffrey J. Gordon,et al.  ARA : formal analysis , 2003 .

[46]  Vipin Kumar,et al.  Parallel Algorithms for Machine Intelligence and Vision , 2011, Symbolic Computation.

[47]  N. E. Day Estimating the components of a mixture of normal distributions , 1969 .

[48]  Bernhard Schölkopf,et al.  Estimating a Kernel Fisher Discriminant in the Presence of Label Noise , 2001, ICML.

[49]  Nils J. Nilsson,et al.  Principles of Artificial Intelligence , 1980, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[50]  Massih-Reza Amini,et al.  The use of unlabeled data to improve supervised learning for text summarization , 2002, SIGIR '02.

[51]  U. Stern,et al.  Using magnetic disk instead of main memory in the mur φ verifier , 1998 .

[52]  Richard E. Korf Delayed Duplicate Detection: Extended Abstract , 2003, IJCAI.

[53]  Theofanis Sapatinas,et al.  Discriminant Analysis and Statistical Pattern Recognition , 2005 .

[54]  Eric A. Hansen,et al.  Structured Duplicate Detection in External-Memory Graph Search , 2004, AAAI.