Explicit Fine grained Syntactic and Semantic Annotation of the Idafa Construction in Arabic

Idafa in traditional Arabic grammar is an umbrella construction that covers several phenomena including what is expressed in English as noun-noun compounds and Saxon and Norman genitives. Additionally, Idafa participates in some other constructions, such as quantifiers, quasi-prepositions, and adjectives. Identifying the various types of the Idafa construction (IC) is of importance to Natural Language processing (NLP) applications. Noun-Noun compounds exhibit special behavior in most languages impacting their semantic interpretation. Hence distinguishing them could have an impact on downstream NLP applications. The most comprehensive syntactic representation of the Arabic language is the LDC Arabic Treebank (ATB). In the ATB, ICs are not explicitly labeled and furthermore, there is no distinction between ICs of noun-noun relations and other traditional ICs. Hence, we devise a detailed syntactic and semantic typification process of the IC phenomenon in Arabic. We target the ATB as a platform for this classification. We render the ATB annotated with explicit IC labels but with the further semantic characterization which is useful for syntactic, semantic and cross language processing. Our typification of IC comprises 3 main syntactic IC types: FIC, GIC, and TIC, and they are further divided into 10 syntactic subclasses. The TIC group is further classified into semantic relations. We devise a method for automatic IC labeling and compare its yield against the CATiB treebank. Our evaluation shows that we achieve the same level of accuracy, but with the additional fine-grained classification into the various syntactic and semantic types.

[1]  Pamela A. Downing On the Creation and Use of English Compound Nouns. , 1977 .

[2]  Beatrice Warren,et al.  Semantic patterns of noun-noun compounds , 1978 .

[3]  Lucy Vanderwende,et al.  Algorithm for Automatic Interpretation of Noun Sequences , 1994, COLING.

[4]  Stan Szpakowicz,et al.  Semi-Automatic Recognition of Noun Modifier Relationships , 1998, ACL.

[5]  J. Strunk The structure of the Kurdish noun phrase , 2003 .

[6]  Dan Moldovan,et al.  Models for the Semantic Classification of Noun Phrases , 2004, HLT-NAACL 2004.

[7]  Dan I. Moldovan,et al.  On the semantics of noun compounds , 2005, Comput. Speech Lang..

[8]  Nizar Habash,et al.  Determining Case in Arabic: Learning Complex Linguistic Behavior Requires Complex Linguistic Features , 2007, EMNLP.

[9]  Roxana Girju,et al.  Improving the Interpretation of Noun Phrases with Cross-linguistic Information , 2007, ACL.

[10]  Seth Kulick,et al.  Enhancing the Arabic Treebank: a Collaborative Effort toward New Annotation Guidelines , 2008, LREC.

[11]  Seth Kulick,et al.  Construct State Modification in the Arabic Treebank , 2008, ACL.

[12]  Nizar Habash,et al.  CATiB: The Columbia Arabic Treebank , 2009, ACL.

[13]  Eduard H. Hovy,et al.  A Taxonomy, Dataset, and Classifier for Automatic Noun Compound Interpretation , 2010, ACL.

[14]  Christopher D. Manning,et al.  Better Arabic Parsing: Baselines, Evaluations, and Analysis , 2010, COLING.

[15]  Heshaam Faili,et al.  A Probabilistic Approach to Persian Ezafe Recognition , 2014, EACL.

[16]  A. Lotfi Persian Ezafe as a 'figure' marker: A unified analysis , 2014 .

[17]  M. Bateni The Ezafe as a head-marking inflectional affix: Evidence from Persian and Kurmanji Kurdish , 2018 .