Active Learning for Constrained Dirichlet Process Mixture Models

Recent work applied Dirichlet Process Mixture Models to the task of verb clustering, incorporating supervision in the form of must-links and cannot-links constraints between instances. In this work, we introduce an active learning approach for constraint selection employing uncertainty-based sampling. We achieve substantial improvements over random selection on two datasets.

[1]  Suzanne Stevenson,et al.  Unsupervised Semantic Role Labellin , 2004, EMNLP.

[2]  Dan Klein,et al.  From Instance-level Constraints to Space-Level Constraints: Making the Most of Prior Knowledge in Data Clustering , 2002, ICML.

[3]  J. Lafferty,et al.  Combining active learning and semi-supervised learning using Gaussian fields and harmonic functions , 2003, ICML 2003.

[4]  Ted Briscoe,et al.  Robust Accurate Statistical Annotation of General Text , 2002, LREC.

[5]  Chih-Jen Lin,et al.  Projected Gradient Methods for Nonnegative Matrix Factorization , 2007, Neural Computation.

[6]  M. Meilă Comparing clusterings---an information based distance , 2007 .

[7]  Beth Levin,et al.  English Verb Classes and Alternations: A Preliminary Investigation , 1993 .

[8]  Martha Palmer,et al.  Investigations into the role of lexical semantics in word sense disambiguation , 2004 .

[9]  Yee Whye Teh,et al.  A Hierarchical Bayesian Language Model Based On Pitman-Yor Processes , 2006, ACL.

[10]  Julia Hirschberg,et al.  V-Measure: A Conditional Entropy-Based External Cluster Evaluation Measure , 2007, EMNLP.

[11]  Claire Cardie,et al.  Clustering with Instance-Level Constraints , 2000, AAAI/IAAI.

[12]  Zoubin Ghahramani,et al.  Unsupervised and Constrained Dirichlet Process Mixture Models for Verb Clustering , 2009 .

[13]  Nigel Collier,et al.  Automatic Classification of Verbs in Biomedical Texts , 2006, ACL.

[14]  Burr Settles,et al.  Active Learning Literature Survey , 2009 .

[15]  Benjamin C. M. Fung,et al.  Hierarchical Document Clustering using Frequent Itemsets , 2003, SDM.

[16]  Phil Blunsom,et al.  Inducing Compact but Accurate Tree-Substitution Grammars , 2009, NAACL.

[17]  Yuval Krymolowski,et al.  Verb Class Discovery from Rich Syntactic Data , 2008, CICLing.

[18]  T. Griffiths,et al.  Modeling individual differences using Dirichlet processes , 2006 .

[19]  Arindam Banerjee,et al.  Probabilistic Semi-Supervised Clustering with Constraints , 2006, Semi-Supervised Learning.