Knowledge acquisition is always regarded as a bottleneck in many NLP tasks, such as machine translation, information extraction. Treebank-based statistical parsing is not an exceptant. The latent linguistic knowledge in treebank is very rich, which, however, cant be acquired directly.In our model, the following three ways are used to incorporate such rich linguistic features for Chinese statistical parsing. First of all, non-recursive noun and verb phrases are annotated in the Penn Chinese Treebank because of their strong mark of boundaries. Second, a new head percolation table is designed based on Xias table. The last linguistic feature our model uses is the context configuration frame which provides a stronger representation of bilexical dependency structures. All these three linguistic features gain an improvement of remarkable 2.37% in terms of F1 measure, 5.36% in terms of complete match ratio.
[1]
Roger Levy,et al.
Is it Harder to Parse Chinese, or the Chinese Treebank?
,
2003,
ACL.
[2]
David Chiang,et al.
Recovering Latent Information in Treebanks
,
2002,
COLING.
[3]
David Chiang,et al.
Two Statistical Parsing Models Applied to the Chinese Treebank
,
2000,
ACL 2000.
[4]
Dan Klein,et al.
Accurate Unlexicalized Parsing
,
2003,
ACL.
[5]
Fei Xia,et al.
Automatic grammar generation from two different perspectives
,
2001
.
[6]
Qun Liu,et al.
Lexicalized Beam Thresholding Parsing with Prior and Boundary Estimates
,
2005,
CICLing.