A Hybrid Approach to NER by Integrating Manual Rules into MEMM

This paper describes a framework for defining domain specific Feature Functions in a user friendly form to be used in a Maximum Entropy Markov Model (MEMM) for the Named Entity Recognition (NER) task. Our system called MERGE allows defining general Feature Function Templates, as well as Linguistic Rules incorporated into the classifier. The simple way of translating these rules into specific feature functions are shown. We show that MERGE can perform better from both purely machine learning based systems and purely-knowledge based approaches by some small expert interaction of rule-tuning.

[1]  Steffen Lange,et al.  A Unifying Approach to HTML Wrapper Representation and Learning , 2000, Discovery Science.

[2]  Richard M. Schwartz,et al.  Nymble: a High-Performance Learning Name-finder , 1997, ANLP.

[3]  Kazem Taghva,et al.  Address extraction using hidden Markov models , 2005, IS&T/SPIE Electronic Imaging.

[4]  Dayne Freitag,et al.  Information Extraction from HTML: Application of a General Machine Learning Approach , 1998, AAAI/IAAI.

[5]  J. Darroch,et al.  Generalized Iterative Scaling for Log-Linear Models , 1972 .

[6]  Dayne Freitag,et al.  Using grammatical inference to improve precision in information extraction , 1997, ICML 1997.

[7]  Wai Lam,et al.  Using Support Vector Machines for Terrorism Information Extraction , 2003, ISI.

[8]  Ronen Feldman,et al.  TEG—a hybrid approach to information extraction , 2005, Knowledge and Information Systems.

[9]  Hwee Tou Ng,et al.  Named Entity Recognition: A Maximum Entropy Approach Using Global Information , 2002, COLING.

[10]  Andrew McCallum,et al.  Information Extraction with HMM Structures Learned by Stochastic Optimization , 2000, AAAI/IAAI.

[11]  Tim Leek,et al.  Information Extraction Using Hidden Markov Models , 1997 .

[12]  Ralph Grishman,et al.  Exploiting Diverse Knowledge Sources via Maximum Entropy in Named Entity Recognition , 1998, VLC@COLING/ACL.

[13]  Richard M. Schwartz,et al.  An Algorithm that Learns What's in a Name , 1999, Machine Learning.

[14]  James S. Aitken Learning Information Extraction Rules: An Inductive Logic Programming approach , 2002, ECAI.

[15]  Andrew McCallum,et al.  Maximum Entropy Markov Models for Information Extraction and Segmentation , 2000, ICML.

[16]  Andrew McCallum,et al.  Information Extraction with HMMs and Shrinkage , 1999 .

[17]  S. Mermelstein,et al.  Information extraction by text classification , 2001 .

[18]  John D. Lafferty,et al.  Inducing Features of Random Fields , 1995, IEEE Trans. Pattern Anal. Mach. Intell..

[19]  Pat Langley,et al.  Editorial: On Machine Learning , 1986, Machine Learning.

[20]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[21]  Walter Daelemans,et al.  Information Extraction via Double Classification , 2003 .

[22]  Adam L. Berger,et al.  A Maximum Entropy Approach to Natural Language Processing , 1996, CL.

[23]  Nicholas Kushmerick,et al.  Finite-State Approaches to Web Information Extraction , 2002, SCIE.