论文信息 - Chinese Statistical Parsing with Rich Linguistic Features

Chinese Statistical Parsing with Rich Linguistic Features

Knowledge acquisition is always regarded as a bottleneck in many NLP tasks, such as machine translation, information extraction. Treebank-based statistical parsing is not an exceptant. The latent linguistic knowledge in treebank is very rich, which, however, cant be acquired directly.In our model, the following three ways are used to incorporate such rich linguistic features for Chinese statistical parsing. First of all, non-recursive noun and verb phrases are annotated in the Penn Chinese Treebank because of their strong mark of boundaries. Second, a new head percolation table is designed based on Xias table. The last linguistic feature our model uses is the context configuration frame which provides a stronger representation of bilexical dependency structures. All these three linguistic features gain an improvement of remarkable 2.37% in terms of F1 measure, 5.36% in terms of complete match ratio.

Xiong De-yi

[1] Roger Levy,et al. Is it Harder to Parse Chinese, or the Chinese Treebank? , 2003, ACL.

[2] David Chiang,et al. Recovering Latent Information in Treebanks , 2002, COLING.

[3] David Chiang,et al. Two Statistical Parsing Models Applied to the Chinese Treebank , 2000, ACL 2000.

[4] Dan Klein,et al. Accurate Unlexicalized Parsing , 2003, ACL.

[5] Fei Xia,et al. Automatic grammar generation from two different perspectives , 2001 .

[6] Qun Liu,et al. Lexicalized Beam Thresholding Parsing with Prior and Boundary Estimates , 2005, CICLing.