The Importance of Nouns in Text Processing

The Importance of Nouns in Text Processing Lauren M. Stuart, Julia M. Taylor, and Victor Raskin ({lstuart, jtaylor1, vraskin}@purdue.edu) Purdue University, 656 Oval Drive West Lafayette, IN 47907-2086 USA of language processing. We wish to explore how a bias in one of these stages, and its correction, affects processing in another. Abstract In the area of computational processing of natural language texts, advances toward simpler yet more accurate models of meaning are desirable. As syntax is a major component of semantic analysis, we explore how a long-term institutional bias towards the verb as the main determiner of syntactic (and semantic) structure may underserve some kinds of information. We introduce an analysis paradigm that restores the noun to some importance in syntactic analysis. A noun- driven syntax representation has been developed and evaluated, and implications of its use in further processing and in better modeling of natural language meaning are investigated. Sentence Processing Keywords: Linguistics, Language Understanding, Human- Computer Interaction, Representation, Syntax, Semantics Introduction Text interpretation by computers is highly desirable and arguably necessary as we continue to produce and analyze text. One major benefit to the improvement of natural language understanding (NLU) for text is more intuitive natural interaction with highly structured or, conversely, loosely associated, large stores of information. Text processing may proceed sequentially, on the assumption that only full (or major) analysis of the surface- ward structure yields the next deepest structure, where deepest structure is some formulation of the text’s meaning, possibly applicable to other meanings of other texts, and the surface structure is the written or spoken input for a computer. This linear process encourages the development of incremental processing modules; that is, given some intermediate representation of something going on in the text, the module will produce a further refined model according to its internal rules and heuristics. For an example, take a phonetic processor that processes speech data and outputs a series of symbols for use in phonological and morphological analysis. The process does not have to be linear; an alternative approach may parallelize the different analyses to some degree, even to the maximum possible (for instance, we cannot process any syntactic data if it has not yet been furnished). In this approach, iterations of processing may “clear up” a map of the sentence’s interpretation (meaning) incrementally. Easy or simple rule applications start the process and such selections provide feedback for further selections in those areas of the map that are not yet clear, for some threshold of “clear” that is dependent upon the form and eventual use of the data. Linear or not, any processing of a text from its surface form to some model of its meaning relies on various stages As a sentence is analyzed, much importance is given to the verb(s): modal verbs modify the main verb; noun phrases participate as subjects or objects; any noun phrase not directly related to the verb may be a complement of a preposition, which is itself associated with the verb or a noun phrase, or of some other clause or phrase whose meaning props up the meaning of the verb (the event said to be encoded in this particular sentence) and whose place or expression in the syntactic structure of the sentence is as much (or more) dictated by the verb as the head of the phrase. Nouns generally remain building blocks of arguments to be fed into a verb. Figure 1: Representation of a phrase structure parse from Berkeley Parser Figure 2: Representation of a dependency parse from Stanford Typed Dependencies In syntactic representations, this privileging of the verb is typically expressed as a shorter distance between the root of the tree and the main verb; some representations go so far as to shorten the paths to the other verbs in the sentence. See Figure 1 for a constituency tree representation, and note the comparative height of the verb and its nouns. Then see