Chinese Named Entity Identification Using Class-based Language Model

We consider here the problem of Chinese named entity (NE) identification using statistical language model(LM). In this research, word segmentation and NE identification have been integrated into a unified framework that consists of several class-based language models. We also adopt a hierarchical structure for one of the LMs so that the nested entities in organization names can be identified. The evaluation on a large test set shows consistent improvements. Our experiments further demonstrate the improvement after seamlessly integrating with linguistic heuristic information, cache-based model and NE abbreviation identification.