Chinese Typed Collocation Extraction using Corpus-based Syntactic Collocation Patterns

Collocations play significant role in many application and extraction them automatically is useful in NLP. Syntactic-based phrase patterns used in collocation extraction have brought advantages due to the well-formedness of results and automatically classifying the candidates into syntactically congeneric categories. However, due to the language independency, the arbitrary choice of syntactic patterns for target collocations brings drawbacks for evaluation as well as adaptation for a new language. This work presents a corpus-driven framework to generate collocation templates for nouns and verbs phrase at first and then integrate them with statistical association measures for noun/verb phrase collocation extraction, namely typed collocation extraction. The experiment results show a higher average precision of 84.80% and a so called local recall value of 55.99% based on a randomly selected noun and verb headwords.