Chinese Name Identification Integrated Decision Tree Learning

Chinese person name identification is a subfield of Named Entity Identification in natural language processing. This identification is divided into three stages in this paper: extraction, classification, and disambiguation. The candidate Chinese person names are extracted using statistical information. The morphological, syntax, and semantic features of the context are also extracted to compose the sample of classification. The estimation of the candidate is deemed to classification. We classify every candidate using decision tree to distinguish whether it is a real Chinese person name. In the end, the inconsistency in classification is disambiguated. Recall and precision are all above 90% in experiments using this method.