NTPC: N-fold Templated Piped Correction

We describe a broadly-applicable conservative error correcting model, N-fold Templated Piped Correction or NTPC (“nitpick”), that consistently improves the accuracy of existing high-accuracy base models. Under circumstances where most obvious approaches actually reduce accuracy more than they improve it, NTPC nevertheless comes with little risk of accidentally degrading performance. NTPC is particularly well suited for natural language applications involving high-dimensional feature spaces, such as bracketing and disambiguation tasks, since its easily customizable template-driven learner allows efficient search over the kind of complex feature combinations that have typically eluded the base models. We show empirically that NTPC yields small but consistent accuracy gains on top of even high-performing models like boosting. We also give evidence that the various extreme design parameters in NTPC are indeed necessary for the intended operating range, even though they diverge from usual practice.

[1]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1997, EuroCOLT.

[2]  Eric Brill,et al.  Transformation-Based Error-Driven Learning and Natural Language Processing: A Case Study in Part-of-Speech Tagging , 1995, CL.

[3]  Erik F. Tjong Kim Sang,et al.  Introduction to the CoNLL-2003 Shared Task: Language-Independent Named Entity Recognition , 2003, CoNLL.

[4]  Xavier Carreras,et al.  Named Entity Extraction using AdaBoost , 2002, CoNLL.

[5]  Walter Daelemans,et al.  Proceedings of CoNLL-2003, Edmonton, Canada , 2003 .

[6]  Marine Carpuat,et al.  Boosting for Named Entity Recognition , 2002, CoNLL.

[7]  R. Rivest Learning Decision Lists , 1987, Machine Learning.

[8]  Marine Carpuat,et al.  Why Nitpicking Works: Evidence for Occam's Razor in Error Correctors , 2004, COLING.

[9]  Erik F. Tjong Kim Sang,et al.  Introduction to the CoNLL-2002 Shared Task: Language-Independent Named Entity Recognition , 2002, CoNLL.

[10]  Koji Tsukamoto,et al.  Learning with Multiple Stacking for Named Entity Recognition , 2002, CoNLL.

[11]  Lluís Màrquez i Villodre,et al.  Boosting Applied toe Word Sense Disambiguation , 2000, ECML.

[12]  Marine Carpuat,et al.  A Stacked, Voted, Stacked Model for Named Entity Recognition , 2003, CoNLL.

[13]  Yoram Singer,et al.  BoosTexter: A Boosting-based System for Text Categorization , 2000, Machine Learning.

[14]  Marine Carpuat,et al.  Raising the Bar: Stacked Conservative Error Correction Beyond Boosting , 2004, LREC.