Compiling Boostexter Rules into a Finite-state Transducer

A number of NLP tasks have been effectively modeled as classification tasks using a variety of classification techniques. Most of these tasks have been pursued in isolation with the classifier assuming unambiguous input. In order for these techniques to be more broadly applicable, they need to be extended to apply on weighted packed representations of ambiguous input. One approach for achieving this is to represent the classification model as a weighted finite-state transducer (WFST). In this paper, we present a compilation procedure to convert the rules resulting from an AdaBoost classifier into an WFST. We validate the compilation technique by applying the resulting WFST on a call-routing application.

[1]  Dan Roth,et al.  Learning to Resolve Natural Language Ambiguities: A Unified Approach , 1998, AAAI/IAAI.

[2]  Yoram Singer,et al.  Improved Boosting Algorithms Using Confidence-rated Predictions , 1998, COLT' 98.

[3]  Kimmo Koskenniemi,et al.  A General Computational Model for Word-Form Recognition and Production , 1984, ACL.

[4]  Srinivas Bangalore,et al.  Supertagging: An Approach to Almost Parsing , 1999, CL.

[5]  Shinichi Morishita,et al.  On Classification and Regression , 1998, Discovery Science.

[6]  C. Douglas Johnson,et al.  Formal Aspects of Phonological Description , 1972 .

[7]  Christopher M. Bishop,et al.  Classification and regression , 1997 .

[8]  Emmanuel Roche,et al.  Finite state transducers: parsing free and frozen sentences , 1999 .

[9]  Richard Sproat,et al.  Compilation of Weighted Finite-State Transducers from Decision Trees , 1996, ACL.

[10]  Steven Abney,et al.  Parsing By Chunks , 1991 .

[11]  Yoav Freund,et al.  Experiments with a New Boosting Algorithm , 1996, ICML.

[12]  Michael Riley,et al.  Speech Recognition by Composition of Weighted Finite Automata , 1996, ArXiv.

[13]  Martin Kay,et al.  Regular Models of Phonological Rule Systems , 1994, CL.

[14]  Srinivas Bangalore,et al.  Stochastic Finite-State Models for Spoken Language Machine Translation , 2000, Machine Translation.

[15]  Enrique Vidal,et al.  Text and speech translation by means of subsequential transducers , 1996, Nat. Lang. Eng..

[16]  Richard Sproat,et al.  An Efficient Compiler for Weighted Rewrite Rules , 1996, ACL.

[17]  Andrew McCallum,et al.  Maximum Entropy Markov Models for Information Extraction and Segmentation , 2000, ICML.

[18]  Enrique Vidal,et al.  Text speech translation by means of subsequential transducers , 1999 .

[19]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.