Relational Learning Assisted Construction of Rule Base for Indian Language NER

We present Inductive Logic Programming (ILP) based techniques for automatically extracting rules for Named Entity Recognition (NER) from tagged corpora and background knowledge. Results using WARMR (Luc Dehaspe and Luc De Raedt 1997) and TILDE (Hendrik Blockeel and Luc De Raedt 1998) to learn rules for named entities of Hindi 1 and Marathi 2 show that the ILP approach has two advantages over hand-crafting the NER rules: (i) the development time reduces by a factor of 120 compared to a linguist doing the entire rule development, and (ii) a complete and consistent view of all significant patterns in the data at the level of abstraction specified through the mode declarations prevails in the learned rules.