论文信息 - A rule-based method for commas' disambiguation in Chinese patent text

A rule-based method for commas' disambiguation in Chinese patent text

We described a rule-based method for disambiguating Chinese commas in patent text which will be beneficial to the work on Chinese-English Patent MT. We annotated ten thousand sentences of patent text and made a number of rules according to the annotated results. Experiments were conducted on 5 intact patent documents containing 1219 commas and our model achieves an accuracy of over 90% overall.

Yaohong Jin | Yun Zhu | Lixia Wang | Qianqian Song

[1] Huang He. The Hybrid Strategy Processing Approach of Complex Long Sentence , 2002 .

[2] Nianwen Xue,et al. Chinese Comma Disambiguation for Discourse Analysis , 2012, ACL.

[3] Geoffrey Nunberg,et al. The linguistics of punctuation , 1990 .

[4] Fuji Ren,et al. From Cloud Computing to Language Engineering, Affective Computing and Advanced Intelligence ∗ , 2010 .

[5] Nianwen Xue,et al. Chinese sentence segmentation as comma classification , 2011, ACL.

[6] Jin Yao-hong,et al. Hybrid-strategy method combining semantic analysis with rule-based MT for patent machine translation , 2012 .

[7] Zhongzhi Shi,et al. Intelligent Science , 2009, RSFDGrC.

[8] Mi-Young Kim,et al. Segmentation of Chinese Long Sentences Using Commas , 2004, SIGHAN@ACL.

[9] Zong Chengqing. A Hierarchical Parsing Approach with Punctuation Processing for Long Chinese Sentences , 2006 .

[10] Jin Yao-hong. Algorithm to improve long patent sentence machine translation , 2011 .