Semantic Clustering: an Attempt to Identify Multiword Expressions in Bengali

One of the key issues in both natural language understanding and generation is the appropriate processing of Multiword Expressions (MWEs). MWE can be defined as a semantic issue of a phrase where the meaning of the phrase may not be obtained from its constituents in a straightforward manner. This paper presents an approach of identifying bigram noun-noun MWEs from a medium-size Bengali corpus by clustering the semantically related nouns and incorporating a vector space model for similarity measurement. Additional inclusion of the English WordNet::Similarity module also improves the results considerably. The present approach also contributes to locate clusters of the synonymous noun words present in a document. Experimental results draw a satisfactory conclusion after analyzing the Precision, Recall and F-score values.

[1]  Aravind K. Joshi,et al.  Measuring the Relative Compositionality of Verb-Noun (V-N) Collocations by Integrating Features , 2005, HLT.

[2]  Sivaji Bandyopadhyay,et al.  Identification of Reduplication in Bengali Corpus and their Semantic Analysis: A Rule Based Approach , 2010, MWE@COLING.

[3]  Timothy Baldwin,et al.  Multiword Expressions: A Pain in the Neck for NLP , 2002, CICLing.

[4]  Jacob Cohen A Coefficient of Agreement for Nominal Scales , 1960 .

[5]  Ted Dunning,et al.  Accurate Methods for the Statistics of Surprise and Coincidence , 1993, CL.

[6]  J. Jenkins,et al.  Word association norms , 1964 .

[7]  Adam Kilgarriff,et al.  Framework and Results for English SENSEVAL , 2000, Comput. Humanit..

[8]  Timothy Baldwin,et al.  An Empirical Model of Multiword Expression Decomposability , 2003, ACL 2003.

[9]  Rebecca J. Passonneau,et al.  Measuring Agreement on Set-valued Items (MASI) for Semantic and Pragmatic Annotation , 2006, LREC.

[10]  Kenneth Ward Church,et al.  Word Association Norms, Mutual Information, and Lexicography , 1989, ACL.

[11]  Tanmoy Chakraborty Identification of Noun-Noun ( N-N ) Collocations as Multi-Word Expressions in Bengali Corpus , 2010 .

[12]  Monojit Choudhury,et al.  Automatic Extraction of Multiword Expressions in Bengali : An Approach for Miserly Resource Scenarios , 2004 .

[13]  Ioannis Korkontzelos,et al.  Detecting Compositionality in Multi-Word Expressions , 2009, ACL/IJCNLP.

[14]  Anoop Kunchukuttan,et al.  A System for Compound Noun Multiword Expression Extraction for Hindi , 2008 .