Complex predicates in Indian languages and wordnets

Wordnets, which are repositories of lexical semantic knowledge containing semantically linked synsets and lexically linked words, are indispensable for work on computational linguistics and natural language processing. While building wordnets for Hindi and Marathi, two major Indo-European languages, we observed that the verb hierarchy in the Princeton Wordnet was rather shallow. We set to constructing a verb knowledge base for Hindi, which arranges the Hindi verbs in a hierarchy of is-a (hypernymy) relation. We realized that there are unique Indian language phenomena that bear upon the lexicalization vs. syntactically derived choice. One such example is the occurrence of conjunct and compound verbs (called Complex Predicates) which are found in all Indian languages. This paper presents our experience in the construction of lexical knowledge bases for Indian languages with special attention to Hindi. The question of storing versus deriving complex predicates has been dealt with linguistically and computationally. We have constructed empirical tests to decide if a combination of two words, the second of which is a verb, is a complex predicate or not. Such tests provide a principled way of deciding the status of complex predicates in Indian language wordnets.

[1]  Piek Vossen,et al.  EuroWordNet: A multilingual database with lexical semantic networks , 1998, Springer Netherlands.

[2]  Albert Sydney Hornby,et al.  Oxford advanced learner\'s dictionary of current English / A S Hornby with A P Cowie, A C Gimson , 1975 .

[3]  Albert Sydney Hornby,et al.  Oxford advanced learner's dictionary of current English with Chinese translation = 牛津現代高級英漢雙解辭典 , 1996 .

[4]  Marius Pasca,et al.  Finding Instance Names and Alternative Glosses on the Web: WordNet Reloaded , 2005, CICLing.

[5]  Kr̥pāśaṅkara Siṃha Readings in Hindi-Urdu linguistics , 1978 .

[6]  Manindra K. Verma Complex predicates in South Asian languages , 1993 .

[7]  Christiane Fellbaum,et al.  Book Reviews: WordNet: An Electronic Lexical Database , 1999, CL.

[8]  Miriam Butt,et al.  The Projection of Arguments: Lexical and Compositional Factors , 2004 .

[9]  Nicola Guarino,et al.  Formal ontology, conceptual analysis and knowledge representation , 1995, Int. J. Hum. Comput. Stud..

[10]  Ramanathan V. Guha,et al.  Building Large Knowledge-Based Systems: Representation and Inference in the Cyc Project , 1990 .

[11]  A. S. Hornby,et al.  Oxford Advanced Learners Dictionary Of Current English - 5/E , 2002 .

[12]  Tara Mohanan,et al.  Wordhood and lexicality: Noun incorporation in Hindi , 1995 .

[13]  Pushpak Bhattacharyya,et al.  Creation of English and Hindi Verb Hierarchies and their Application to Hindi WordNet Building and English-Hindi MT , 2004 .

[14]  斉藤 康己,et al.  Douglas B. Lenat and R. V. Guha : Building Large Knowledge-Based Systems, Representation and Inference in the Cyc Project, Addison-Wesley (1990). , 1990 .

[15]  Beth Levin,et al.  English Verb Classes and Alternations: A Preliminary Investigation , 1993 .

[16]  Daniel Jurafsky,et al.  Semantic Taxonomy Induction from Heterogenous Evidence , 2006, ACL.

[17]  Pushpak Bhattacharyya,et al.  Knowledge Extraction from Hindi Text , 2001 .

[18]  Merle M. Knight,et al.  The Oxford Hindi-English Dictionary , 1996 .

[19]  Rajeshwari Pandharipande Serial Verb Construction in Marathi , 1990 .