Representations for category disambiguation

As it serves as a basis for POS tagging, category induction, and human category acquisition, we investigate the information needed to disambiguate a word in a local context, when using corpus categories. Specifically, we increase the recall of an error detection method by abstracting the word to be disambiguated to a representation containing information about some of its inherent properties, namely the set of categories it can potentially have. This work thus provides insights into the relation of corpus categories to categories derived from local contexts.

[1]  Sabine Brants,et al.  The TIGER Treebank , 2001 .

[2]  Walt Detmar Meurers,et al.  Detecting Errors in Part-of-Speech Annotation , 2003, EACL.

[3]  Hinrich Schütze,et al.  Distributional Part-of-Speech Tagging , 1995, EACL.

[4]  Toben H. Mintz Frequent frames as a cue for grammatical categories in child directed speech , 2003, Cognition.

[5]  Nick Chater,et al.  Distributional Information: A Powerful Cue for Acquiring Syntactic Categories , 1998, Cogn. Sci..

[6]  B. MacWhinney The CHILDES project: tools for analyzing talk , 1992 .

[7]  Michele Banko,et al.  Part-of-Speech Tagging in Context , 2004, COLING.

[8]  Daniel Jurafsky,et al.  Morphological features help POS tagging of unknown words across language varieties , 2005, IJCNLP.

[9]  Walt Detmar Meurers,et al.  Increasing the Recall of Corpus Annotation Error Detection , 2007 .

[10]  Thorsten Brants Internal and external tagsets in part-of-speech tagging , 1997, EUROSPEECH.

[11]  Penelope Sibun,et al.  A Practical Part-of-Speech Tagger , 1992, ANLP.

[12]  Toben H. Mintz Category induction from distributional cues in an artificial language , 2002, Memory & cognition.

[13]  Toben H. Mintz,et al.  Categorizing words using 'frequent frames': what cross-linguistic analyses reveal about distributional acquisition strategies. , 2009, Developmental science.

[14]  Markus Dickinson,et al.  Determining Ambiguity Classes for Part-of-Speech Tagging , 2007 .

[15]  Yoav Goldberg,et al.  EM Can Find Pretty Good HMM POS-Taggers (When Given a Good Start) , 2008, ACL.

[16]  Hinrich Schütze,et al.  Book Reviews: Foundations of Statistical Natural Language Processing , 1999, CL.

[17]  Beatrice Santorini,et al.  Building a Large Annotated Corpus of English: The Penn Treebank , 1993, CL.

[18]  Markus Dickinson,et al.  Error detection and correction in annotated corpora , 2005 .

[19]  Alexander Clark,et al.  Combining Distributional and Morphological Information for Part of Speech Induction , 2003, EACL.

[20]  Walter Daelemans,et al.  MBT: A Memory-Based Part of Speech Tagger-Generator , 1996, VLC@COLING.