Classification of Verb Particle Constructions with the Google Web1T Corpus

Manually maintaining comprehensive databases of multi-word expressions, for example Verb-Particle Constructions (VPCs), is infeasible. We describe a new type level classifier for potential VPCs, which uses information in the Google Web1T corpus to perform a simple linguistic constituency test. Specifically, we consider the fronting test, comparing the frequencies of the two possible orderings of the given verb and particle. Using only a small set of queries for each verb-particle pair, the system was able to achieve an F-score of 75.7% in our evaluation while processing thousands of queries a second.

[1]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[2]  Aline Villavicencio VERB-PARTICLE CONSTRUCTIONS IN THE WORLD WIDE WEB , 2006 .

[3]  Timothy Baldwin,et al.  Extracting the Unextractable: A Case Study on Verb-particles , 2002, CoNLL.

[4]  John Carroll,et al.  Detecting a Continuum of Compositionality in Phrasal Verbs , 2003, ACL 2003.

[5]  金田 重郎,et al.  C4.5: Programs for Machine Learning (書評) , 1995 .

[6]  Suzanne Stevenson,et al.  Distinguishing Subtypes of Multiword Expressions Using Linguistically-Motivated Statistical Measures , 2007 .

[7]  Geoffrey E. Hinton,et al.  Learning internal representations by error propagation , 1986 .

[8]  Aline Villavicencio Verb-Particle Constructions and Lexical Resources , 2003, ACL 2003.

[9]  Beth Levin,et al.  English Verb Classes and Alternations: A Preliminary Investigation , 1993 .

[10]  Timothy Baldwin,et al.  A Statistical Approach to the Semantics of Verb-Particles , 2003, ACL 2003.

[11]  Colin J. Bannard,et al.  Learning about the meaning of verb-particle constructions from corpora , 2005, Comput. Speech Lang..

[12]  Mary Gardiner,et al.  Practical Queries of a Massive n-gram Database , 2007, ALTA.

[13]  Deniz Yuret,et al.  KU: Word Sense Disambiguation by Substitution , 2007, Fourth International Workshop on Semantic Evaluations (SemEval-2007).

[14]  Timothy Baldwin,et al.  Automatic Identification of English Verb Particle Constructions using Linguistic Features , 2006, ACL 2006.

[15]  J. Ross Quinlan,et al.  Induction of Decision Trees , 1986, Machine Learning.

[16]  Suzanne Stevenson,et al.  Classifying Particle Semantics in English Verb-Particle Constructions , 2006 .

[17]  Jan Svartvik,et al.  A __ comprehensive grammar of the English language , 1988 .

[18]  Mark Johnson,et al.  Unsupervised learning of multi-word verbs , 2001 .

[19]  Timothy Baldwin,et al.  Looking for Prepositional Verbs in Corpus Data , 2005, ACL 2005.

[20]  Thorsten Brants,et al.  Randomized Language Models via Perfect Hash Functions , 2008, ACL.

[21]  Frank Smadja,et al.  Retrieving Collocations from Text: Xtract , 1993, CL.

[22]  Afsaneh Fazly,et al.  Automatically Distinguishing Literal and Figurative Usages of Highly Polysemous Verbs , 2005, ACL 2005.