Learning from ambiguous examples
暂无分享,去创建一个
The main drawback of the supervised learning approach to solving pattern classification problems is that the initial instance-label pairs are often expensive to collect due to required human effort or comprehensive testing. In many applications however, it is evidently more practical and sometimes essential to collect training examples that are ambiguous due to polymorphism or missing labels. Since these ambiguous examples have a small number of interpretations as instance-label pairs, they are still informative. It is therefore of great interest to practitioners and machine learning researchers alike to develop principled methods that can utilize such examples. This thesis demonstrates that the burden of collecting a large number of examples may be supplanted with algorithms that learn from readily available inputs that are ambiguous. This thesis presents a statistical learning theoretic framework for learning from ambiguous examples that is based on a novel formalization of disambiguation consistency. Intuitively, a valid interpretation and concept hypothesis must be mutually reinforcing. Using this principle, disambiguation and learning are jointly formulated as a non-convex maximum-margin problem. The first presented algorithmic approach for solving the disambiguation and learning problem uses a 2-stage mixed-integer local search technique that leverages state-of-the-art support vector machine software. The subsequent two algorithms use novel specializations of methods from disjunctive programming, a branch of combinatorial optimization. The first and third algorithms are of practical importance because they are efficient and scale to large data sets. Empirical results on benchmark data sets from the multi-instance and transductive learning domains are provided. These results demonstrate that, by accounting for ambiguity explicitly, classifier accuracy does not degrade and can improve significantly over techniques that ignore ambiguity. The results also suggest that our approach is best suited for tasks in which one and only one interpretation of each ambiguous example is valid, which is a reasonable assumption when the ambiguity is due to missing labels in the training data.