Part-of-Speech Tagging with Both Character and Word Information

Part-of-speech tagging is to determine an appropriate grammatical category for each word in a sentence, which is one of the basic tasks of natural language processing. The former part-of-speech tagging methods mostly study the co-occurrence probability of the adjacent parts of speech at the word level, and lack the analysis of the internal structure of the word. In this paper, we propose a maximum entropy based Chinese part-of-speech tagger which not only uses word and part-of-speech information, but also uses character information inside the word. Our approach gives an error reduction of 61.3%, compared to the approach using only the word information.