Efficient Dependency Analysis for Korean Sentences Based on Lexical Association and Multi-layered Chunking

Syntactic analysis often plays an important role in natural language processing application systems such as document processing and question-answering. To use parsed results in the application systems, the parser should be efficient without losing accuracy. In this paper, we present a method for Korean dependency analysis using three types of chunking and lexical co-occurrences extracted from a large corpus. The chunking, which is crucial for reducing disambiguation decisions in the parsing process, is conducted using a finite state transducer and lexical collocation. In addition, lexical information has a great impact on parsing for a free-ordered language such as Korean, as the lexical association is more important than the word order in analysing such languages. The parser that we propose is a hybrid system directed by statistical data and syntactic rules, and based on right-to-left analysis to effectively treat sentences in Korean, which is a head final language. Experiments show that the method is very effective in that it gives accuracy as well as efficiency by reducing irrelevant parsing decisions.