Chinese Word Segmentation and Named Entity Recognition Based on Conditional Random Fields

Chinese word segmentation (CWS), named entity recognition (NER) and part-ofspeech tagging is the lexical processing in Chinese language. This paper describes the work on these tasks done by France Telecom Team (Beijing) at the fourth International Chinese Language Processing Bakeoff. In particular, we employ Conditional Random Fields with different features for these tasks. In order to improve NER relatively low recall; we exploit non-local features and alleviate class imbalanced distribution on NER dataset to enhance the recall and keep its relatively high precision. Some other post-processing measures such as consistency checking and transformation-based error-driven learning are used to improve word segmentation performance. Our systems participated in most CWS and POS tagging evaluations and all the NER tracks. As a result, our NER system achieves the first ranks on MSRA open track and MSRA/CityU closed track. Our CWS system achieves the first rank on CityU open track, which means that our systems achieve state-of-the-art performance on Chinese lexical processing.