A rule-based method for commas' disambiguation in Chinese patent text

We described a rule-based method for disambiguating Chinese commas in patent text which will be beneficial to the work on Chinese-English Patent MT. We annotated ten thousand sentences of patent text and made a number of rules according to the annotated results. Experiments were conducted on 5 intact patent documents containing 1219 commas and our model achieves an accuracy of over 90% overall.