Unsupervised Aggregation for Classification Problems with Large Numbers of Categories

Classication problems with a very large or unbounded set of output categories are common in many areas such as natural language and image processing. In order to improve accuracy on these tasks, it is natural for a decision-maker to combine predictions from various sources. However, supervised data needed to t an aggregation model is often dicult to obtain, especially if needed for multiple domains. Therefore, we propose a generative model for unsupervised aggregation which exploits the agreement signal to estimate the expertise of individual judges. Due to the large output space size, this aggregation model cannot encode expertise of constituent judges with respect to every category for all problems. Consequently, we extend it by incorporating the notion of category types to account for variability of the judge expertise depending on the type. The viability of our approach is demonstrated both on synthetic experiments and on a practical task of syntactic parser aggregation.

[1]  Beatrice Santorini,et al.  Building a Large Annotated Corpus of English: The Penn Treebank , 1993, CL.

[2]  Alon Lavie,et al.  Parser Combination by Reparsing , 2006, NAACL.

[3]  Joakim Nivre,et al.  Integrating Graph-Based and Transition-Based Dependency Parsers , 2008, ACL.

[4]  C. L. Mallows NON-NULL RANKING MODELS. I , 1957 .

[5]  Léopold Simar,et al.  Computer Intensive Methods in Statistics , 1994 .

[6]  Christian Genest,et al.  Combining Probability Distributions: A Critique and an Annotated Bibliography , 1986 .

[7]  A. Diederich,et al.  Evaluating and Combining Subjective Probability Estimates , 1997 .

[8]  Dan Roth,et al.  Unsupervised rank aggregation with distance-based models , 2008, ICML '08.

[9]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[10]  Thomas G. Dietterich Multiple Classifier Systems , 2000, Lecture Notes in Computer Science.

[11]  Amir Globerson,et al.  Nightmare at test time: robust learning by feature deletion , 2006, ICML.

[12]  J. Pitman Exchangeable and partially exchangeable random partitions , 1995 .

[13]  Sebastian Riedel,et al.  The CoNLL 2007 Shared Task on Dependency Parsing , 2007, EMNLP.

[14]  Max Welling Donald,et al.  Products of Experts , 2007 .

[15]  Lucien Tesnière Éléments de syntaxe structurale , 1959 .

[16]  John D. Lafferty,et al.  Conditional Models on the Ranking Poset , 2002, NIPS.

[17]  Ronald A. Howard,et al.  Bayesian aggregation of probability forecasts on categorical events , 2004 .

[18]  Thomas B. Sheridan,et al.  Optimal Combination of Information from Multiple Sources. , 1986 .

[19]  R. Cooke Experts in Uncertainty: Opinion and Subjective Probability in Science , 1991 .

[20]  Tao Qin,et al.  Supervised rank aggregation , 2007, WWW '07.

[21]  T. Ferguson A Bayesian Analysis of Some Nonparametric Problems , 1973 .

[22]  Peter L. Bartlett,et al.  The Rademacher Complexity of Co-Regularized Kernel Classes , 2007, AISTATS.

[23]  K. Pearson,et al.  Biometrika , 1902, The American Naturalist.