Research on Sentence Segmentation with Conjunctions in Patent Machine Translation

The processing of long sentences is a difficult problem in machine translation. Previous researchers used punctuation to deal with it. In this paper, we presented a rule-based method for sentence segmentation with conjunctions to improve the performance of long sentence machine translation in patent text. We divided conjunctions into different LEVELs according to semantic features of verbs which are before and behind them. Then, we formulated a number of rules based on the LEVELs of conjunctions to segment long Chinese sentence into separated shorter ones. We conducted experiments on 10 intact patent documents which contain 901 conjunctions. Consequently, our method achieves an accuracy of over 89% overall. The result indicates that our method can efficiently improve the performance of long patent sentence translation.