Support Vector Machine Applied to the Semantic Interpretation of VN Compound

The semantic interpretation of nominal compounds is one of the most difficult problems in natural language processing. VN compound is a subset of nominal compounds where the modifier is a verb nominalization. This paper proposes a new interpretation model in which a support vector machine is applied to label five semantic relations involved in Chinese VN compounds. The World Wide Web is exploited as a large corpus to compute point-wise mutual information between the VN compounds and a set of relation specific lexical patterns. Such Web-based statistics is used as the classification features for the support vector machine. By applying a sub-linear transformation and discretization of the raw statistics, a good result is obtained for the five semantic relations.

[1]  John F. Sowa,et al.  Conceptual Structures: Information Processing in Mind and Machine , 1983 .

[2]  Donald Loritz,et al.  The analysis of noun sequences using semantic information extracted from on-line dictionaries , 1996 .

[3]  Timothy Baldwin,et al.  Interpreting Semantic Relations in Noun Compounds via Verb Semantics , 2006, ACL.

[4]  Rosemary Leonard,et al.  The Interpretation of English Noun Sequences on the Computer , 1984 .

[5]  S. Siegel,et al.  Nonparametric Statistics for the Behavioral Sciences , 2022, The SAGE Encyclopedia of Research Design.

[6]  Nianwen Xue,et al.  Semantic role labeling of nominalized predicates in Chinese , 2006, NAACL.

[7]  Yuji Matsumoto,et al.  Chunking with Support Vector Machines , 2001, NAACL.

[8]  Ruzhan Lu,et al.  Identification of Chinese Verb Nominalization Using Support Vector Machine , 2007, MICAI.

[9]  Ronald Rosenfeld,et al.  Improving trigram language modeling with the World Wide Web , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[10]  Marti A. Hearst Automatic Acquisition of Hyponyms from Large Text Corpora , 1992, COLING.

[11]  Lukasz A. Kurgan,et al.  CAIM discretization algorithm , 2004, IEEE Transactions on Knowledge and Data Engineering.

[12]  James Pustejovsky,et al.  The Generative Lexicon , 1995, CL.

[13]  Maria Lapata,et al.  The Automatic Interpretation of Nominalizations , 2000, AAAI/IAAI.

[14]  Peter D. Turney Mining the Web for Synonyms: PMI-IR versus LSA on TOEFL , 2001, ECML.

[15]  Timothy Baldwin,et al.  Automatic Interpretation of Noun Compounds Using WordNet Similarity , 2005, IJCNLP.

[16]  Gregory Grefenstette,et al.  Estimation of English and non-English Language Use on the WWW , 2000, RIAO.

[17]  Christiane Fellbaum,et al.  Book Reviews: WordNet: An Electronic Lexical Database , 1999, CL.

[18]  Timothy W. Finin,et al.  The semantic interpretation of compound nominals , 1980 .

[19]  Thorsten Joachims,et al.  Text Categorization with Support Vector Machines: Learning with Many Relevant Features , 1998, ECML.

[20]  Rosie Jones,et al.  Automatically Building a Corpus for a Minority Language from the Web , 2000, ACL 2000.

[21]  Hideki Isozaki,et al.  Efficient Support Vector Classifiers for Named Entity Recognition , 2002, COLING.

[22]  David Blair Mcdonald,et al.  Understanding noun compounds , 1982 .

[23]  Doug Downey,et al.  Web-scale information extraction in knowitall: (preliminary results) , 2004, WWW '04.

[24]  Michael Johnston,et al.  Qualia Structure and the Compositional Interpretation of Compounds , 1999 .

[25]  Mirella Lapata,et al.  A comparison of parsing technologies for the biomedical domain , 2005, Natural Language Engineering.

[26]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[27]  Barbara Rosario,et al.  Classifying the Semantic Relations in Noun Compounds via a Domain-Specific Lexical Hierarchy , 2001, EMNLP.

[28]  Dan Moldovan,et al.  Models for the Semantic Classification of Noun Phrases , 2004, HLT-NAACL 2004.

[29]  Frank Keller,et al.  Using the Web to Obtain Frequencies for Unseen Bigrams , 2003, CL.