Multilingual Corpus-based Approach to the Resolution of English -ing

Corpus data has proven to be useful for dealing with ambiguities in NLP. A number of studies, for example, have deal with disambiguating English PP attachments, using corpus data (Hindle and Rooth (1993), Brill and Resnik (1994), Steina and Nagao (1997), Ratnaparkhi (1998), and Pantel and Lin (2000), among others). This paper explores a novel approach to resolving ambiguities associated with –ing + Noun constructions in English. We use an aligned multilingual (English, Spanish, French, German and Japanese) corpus to extract lexical information necessary for disambiguation. Our premise is that while in English –ing constructions are highly ambiguous, corresponding constructions in other languages may not be ambiguous, and can thus provide English with disambiguating information. We argue that with aligned multilingual corpora, languages can learn non-trivial linguistic information from one another. 1. Ambiguities in English –ing constructions Different syntactic and semantic relationships can exist in English between an -ing verb form and a following noun. At the syntactic level, an NLP system must decide whether the –ing + noun construction is a verb + object pair, or if it is a modifier + noun pair. So, for example, in (1a) using is a verb with the object passwords, whereas in (1b) testing is a modifier of purposes. (1a) Click to learn more about using passwords with your identity. (1b) For testing purposes, click Next. For the purpose of translation, it is often the case that we need to specify what type of modification relationship exists between an -ing form and a following noun in a noun phrase. In (1b) the relationship of testing to purposes might be considered one of adjunct to noun as in the paraphrase, purposes of testing. But in other constructions that are similar with respect to syntax, the noun following the -ing form may actually be better thought of as the subject of the ing verb. So, in (1c) the noun rows might be interpreted as the subject of matching, as in the paraphrase rows that match. (1c) It specifies that matching rows returned by the query match a list of words. Certainly, a similar paraphrase, i.e, purposes that test, is not possible for (1b). In this paper we explore the automatic extraction of information necessary to distinguish verb + object constructions (such as (1a)) from modifier + noun constructions (such as (1b) and (1c)). 2. -Ing constructions in other languages While in English, the -ing + Noun construction is often ambiguous, in other languages, various linguistic devices, often unambiguous in nature, are used to instantiate the different relationships between the parts of the construction. For example, the NP licensing information in (2a), in which licensing is a modifier of the noun information (i.e., ‘information for licensing’), is likely to be expressed as a compound noun in languages such as Japanese or German as shown in (2b) and (2c). In languages such as French or Spanish, on the other hand, the same type of modifier + noun relationship is likely to be expressed as a noun + prepositional phrase construction (‘information about licensing’), as shown in (2d) and (2e). (2a) English: When the number of users is different from the number of computers, this may provide incorrect licensing information. (2b) Japanese: ユーザーの数がコンピュータの 数と異なる場合は、正しいライセンス情報が