Use of Heuristic Knowledge in Chinese Language Analysis

This paper describes an analysis method which uses heuristic knowledge to find local syntactic structures of Chinese sentences. We call it a preprocessing, because we use it before we do global syntactic structure analysis (1) of the input sentence. Our purpose is to guide the global analysis through the search space, to avoid unnecessary computation.To realize this, we use a set of special words that appear in commonly used patterns in Chinese. We call them "characteristic words". They enable us to pick out fragments that might figure in the syntactic structure of the sentence. Knowledge concerning the use of characteristic words enables us to rate alternative fragments, according to pattern statistics, fragment length, distance between characteristic words, and so on. The preprocessing system proposes to the global analysis level a most "likely" partial structure. In case this choice is rejected, backtracking looks for a second choice, and so on.For our system, we use 200 characteristic words. Their rules are written by 101 automata. We tested them against 120 sentences taken from a Chinese physics text book. For this limited set, correct partial structures were proposed as first choice for 94% of sentences. Allowing a 2nd choice, the score is 98%, with a 3rd choice, the score is 100%.