From construction candidates to constructicon entries: An experiment using semi-automatic methods for identifying constructions in corpora

We present an experiment where natural language processing tools are used to automatically identify potential constructions in a corpus. The experiment was conducted as part of the ongoing efforts to develop a Swedish constructicon. Using an automatic method to suggest constructions has advantages not only for efficiency but also methodologically: it forces the analyst to look more objectively at the constructions actually occurring in corpora, as opposed to focusing on “interesting” constructions only. As a heuristic for identifying potential constructions, the method has proved successful, yielding about 200 (out of 1,200) highly relevant construction candidates.

[1]  G. A. Barnard,et al.  Transmission of Information: A Statistical Theory of Communications. , 1961 .

[2]  Stefan Evert,et al.  The Statistics of Word Cooccur-rences: Word Pairs and Collocations , 2004 .

[3]  Kenneth Ward Church,et al.  Word Association Norms, Mutual Information, and Lexicography , 1989, ACL.

[4]  R. Jackendoff Foundations of Language: Brain, Meaning, Grammar, Evolution , 2002 .

[5]  Markus Forsberg,et al.  The Past Meets the Present in Swedish FrameNet , 2010 .

[6]  Christian Biemann,et al.  Distributional Semantics and Compositionality 2011: Shared Task Description and Results , 2011 .

[7]  Joakim Nivre,et al.  MaltParser: A Language-Independent System for Data-Driven Dependency Parsing , 2007, Natural Language Engineering.

[8]  Charles J. Fillmore,et al.  Border conflicts: FrameNet meets construction grammar , 2008 .

[9]  David Wible,et al.  StringNet as a Computational Resource for Discovering and Investigating Linguistic Constructions , 2010, HLT-NAACL 2010.

[10]  T. V. D. Cruys Two multivariate generalizations of pointwise mutual information , 2011 .

[11]  Joakim Nivre,et al.  Cultivating a Swedish Treebank , 2008 .

[12]  Markus Forsberg,et al.  Automatic identification of construction candidates for a Swedish constructicon , 2013 .

[13]  Joan L. Bybee,et al.  Usage-based Theory and Exemplar Representations of Constructions , 2013 .

[14]  Hans C. Boas,et al.  Grammatische Konstruktionen und semantische Frames für die Textanalyse , 2014 .

[15]  A. Goldberg Constructions at Work: The Nature of Generalization in Language , 2006 .

[16]  Frans Coenen,et al.  A survey of frequent subgraph mining algorithms , 2012, The Knowledge Engineering Review.

[17]  Paul Kay,et al.  The Limits of (Construction) Grammar , 2013 .

[18]  Charles J. Fillmore,et al.  The FrameNet Constructicon , 2011 .

[19]  Martin Hilpert,et al.  Construction Grammar and its Application to English , 2014 .

[20]  Hans C. Boas Zur Architektur einer konstruktionsbasierten Grammatik des Deutschen , 2013 .

[21]  W. Bruce Croft,et al.  Lexical rules vs. constructions: A false dichotomy , 2003 .

[22]  András Kornai,et al.  HunPos: an open source trigram tagger , 2007, ACL 2007.

[23]  Z. Harris,et al.  Foundations of language , 1941 .

[24]  Pavel Pecina,et al.  Lexical association measures and collocation extraction , 2009, Lang. Resour. Evaluation.

[25]  Graeme Trousdale,et al.  The Oxford Handbook of Construction Grammar , 2013 .