Clause processing in cornplex sentences

The purpose of this investigation is to propose and test an algorithm for the segmentation of complex sentences into clauses. The algorithm is built after the parts of speech for each lexical item are assigned. Formal indicators of subordination and coordination, along with information about the valence of the verbs found in the immediate context are used to mark the beginning and end of each clause. When the clauses are identified they are classified into either a noun or an adverb, using information provided by the surrounding context. The algorithm was tested using a machine translation system developed by the author, which included an English/Portuguese dictionary, a part of speech tagging system and the ability to introduce rules, including the ones required by the algorithm. The results showed that out of 1659 clauses, randomly taken from a 10,000,000-word corpus, more than 98% were correctly segmented and 95% correctly classified into nouns or adverbs.