论文信息 - Algorithms and Analysis for Multi-Category Classification

Algorithms and Analysis for Multi-Category Classification

Classification problems in machine learning involve assigning labels to various kinds of output types, from single assignment binary and multi-class classification to more complex assignments such as category ranking, sequence identification, and structured-output classification. Traditionally, most machine learning algorithms and theory is developed for the binary setting. In this dissertation, we provide a framework to unify these problems. Through this framework, many algorithms and significant theoretic understanding developed in the binary domain is extended to more complex settings. First, we introduce Constraint Classification, a learning framework that provides a unified view of complex-output problems. Within this framework, each complex-output label is viewed as a set of constraints, sufficient enough to capture the information needed to classify the example. Thus, prediction in the complex-output setting is reduced to determining which constraints, out of a potentially large set, hold for a given example---a task that can be accomplished by the repeated application of a single binary classifier to indicate whether or not each constraint holds. Using this insight, we provide a principled extension of binary learning algorithms, such as the support vector machine and the Perceptron algorithm to the complex-output domain. We also show that desirable theoretical and experimental properties of the algorithms are maintained in the new setting. Second, we address the structured output problem directly. Structured output labels are collections of variables corresponding to a known structure, such as a tree, graph, or sequence that can bias or constrain the global output assignment. The traditional approach for learning structured output classifiers, that decomposes a structured output into multiple localized labels to learn independently, is theoretically sub-optimal. In contrast, recent methods, such as constraint classification, that learn functions to directly classify the global output can optimal performance. Surprisingly, in practice it is unclear which methods achieve state-of-the-art performance. In this work, we study under what circumstances each method performs best. With enough time, training data, and representative power, the global approaches are better. However, we also show both theoretically and experimentally that learning a suite of local classifiers, even sub-optimal ones, can produce the best results under many real-world settings. Third, we address an important algorithm in machine learning, the maximum margin classifier. Even with a conceptual understanding of how to extend maximum margin algorithms to more complex settings and performance guarantees of large margin classifiers, complex outputs render traditional approaches intractable in more complex settings. We introduce a new algorithm for learning maximum margin classifiers using coresets to find provably approximate solution to maximum margin linear separating hyperplane. Then, using the constraint classification framework, this algorithm applies directly to all of the previously mentioned complex-output domains. In addition, coresets motivate approximate algorithms for active learning and learning in the presence of outlier noise, where we give simple, elegant, and previously unknown proofs of their effectiveness.

Dan Roth | Dav Zimak | Dav Zimak

[1] Xavier Carreras,et al. Phrase recognition by filtering and ranking with perceptrons , 2003, RANLP.

[2] Rama Chellappa,et al. Human and machine recognition of faces: a survey , 1995, Proc. IEEE.

[3] Corinna Cortes,et al. Support-Vector Networks , 1995, Machine Learning.

[4] Michael Collins,et al. Discriminative Training Methods for Hidden Markov Models: Theory and Experiments with Perceptron Algorithms , 2002, EMNLP.

[5] Andrew McCallum,et al. Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[6] Robert E. Schapire,et al. Efficient distribution-free learning of probabilistic concepts , 1990, Proceedings [1990] 31st Annual Symposium on Foundations of Computer Science.

[7] Yoram Singer,et al. Reducing Multiclass to Binary: A Unifying Approach for Margin Classifiers , 2000, J. Mach. Learn. Res..

[8] Yoav Freund,et al. A decision-theoretic generalization of on-line learning and an application to boosting , 1997, EuroCOLT.

[9] D. Angluin,et al. Learning From Noisy Examples , 1988, Machine Learning.

[10] Nianwen Xue,et al. Calibrating Features for Semantic Role Labeling , 2004, EMNLP.

[11] Dan Roth,et al. A Learning Approach to Shallow Parsing , 1999, EMNLP.