Federating clustering and cluster labelling capabilities with a single approach based on feature maximization: French verb classes identification with IGNGF neural clustering

Classifications which group together verbs and a set of shared syntactic and semantic properties have proven to be useful in both linguistics and Natural Language Processing tasks. However, most existing approaches for automatically acquiring verb classes fail to associate the verb classes produced with an explicit characterisation of the syntactic and semantic properties shared by the class elements. We propose a novel approach to verb clustering which addresses this shortcoming and permits building verb classifications whose classes group together verbs, subcategorisation frames and thematic grids. Our approach involves the use of a recent neural clustering method called IGNGF (Incremental Growing Neural Gas with Feature maximization). The use of a standard distance measure for determining a winner is replaced in IGNGF by feature maximisation measure relying on the features of the data that are associated with clusters during learning. A main advantage of the method is that maximised features used by IGNGF during learning can also be exploited in a final step for accurately labelling the resulting clusters. In this paper, we exploit IGNGF for the unsupervised classification of French verbs and evaluate the obtained clusters (i.e., verb classes) in two different ways. The first way is a quantitative analysis of the clustering process relying on a usual gold standard and on complementary unbiased clustering quality indexes. The second way is a qualitative analysis of the cluster labelling process. Relying on an adapted gold standard, we evaluate the capacity of the IGNGF clusters labels (i.e., subcategorisation frames and thematic grids) to be exploited for bootstraping a VerbNet-like classification for French. Both analyses clearly highlight the advantages of the approach.

[1]  Yuji Matsumoto,et al.  Detecting the Organization of Semantic Subclasses of Japanese Verbs , 1997 .

[2]  Martha Palmer,et al.  Class-Based Construction of a Verb Lexicon , 2000, AAAI/IAAI.

[3]  Martha Palmer,et al.  Verbnet: a broad-coverage, comprehensive verb lexicon , 2005 .

[4]  A. Ennaji,et al.  An incremental growing neural gas learns topologies , 2005, Proceedings. 2005 IEEE International Joint Conference on Neural Networks, 2005..

[5]  Jean-Charles Lamirel,et al.  Feature-based cluster validation for high-dimensional data , 2008 .

[6]  Thierry Poibeau,et al.  Investigating the cross-linguistic potential of VerbNet-style classification , 2010, COLING.

[7]  Beth Levin,et al.  English Verb Classes and Alternations: A Preliminary Investigation , 1993 .

[8]  Donald W. Bouldin,et al.  A Cluster Separation Measure , 1979, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[9]  Sabine Schulte im Walde Experiments on the Automatic Induction of German Semantic Verb Classes , 2006, CL.

[10]  F. Attneave,et al.  The Organization of Behavior: A Neuropsychological Theory , 1949 .

[11]  Jean-Charles Lamirel,et al.  Variations to incremental growing neural gas algorithm based on label maximization , 2011, The 2011 International Joint Conference on Neural Networks.

[12]  Jean-Charles Lamirel,et al.  Enhancing NLP Tasks by the Use of a Recent Neural Incremental Clustering Approach Based on Cluster Data Feature Maximization , 2012, WSOM.

[13]  Piet Mertens,et al.  La valence: l'approche pronominale et son application au lexique verbal , 2003 .

[14]  Suzanne Stevenson,et al.  A Multilingual Paradigm for Automatic Verb Classification , 2002, ACL.

[15]  Jean-Charles Lamirel,et al.  Automatic Websites Classification and Retrieval using Websites Communication Signatures , 2012 .

[16]  Martha Palmer,et al.  Investigating Regular Sense Extensions Based on Intersective Levin Classes , 1998, COLING-ACL.

[17]  J. Dunn Well-Separated Clusters and Optimal Fuzzy Partitions , 1974 .

[18]  Jean-Charles Lamirel,et al.  Clustering Quality Measures for Data Samples with Multiple Labels , 2006, Databases and Applications.

[19]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[20]  P. Rousseeuw Silhouettes: a graphical aid to the interpretation and validation of cluster analysis , 1987 .

[21]  Anna Korhonen,et al.  Semantically Motivated Subcategorization Acquisition , 2002, ACL 2002.

[22]  Suzanne Stevenson,et al.  Exploiting a Verb Lexicon in Automatic Semantic Role Labelling , 2005, HLT.

[23]  blanchec Méthodes en syntaxe , 2016 .

[24]  Alexandra Kinyon,et al.  Building a Treebank for French , 2000, LREC.

[25]  Jean-Charles Lamirel,et al.  Novel labeling strategies for hierarchical representation of multidimensional data analysis results , 2008 .

[26]  Anne Abeillé,et al.  Growing TreeLex , 2008, CICLing.

[27]  Bonnie J. Dorr,et al.  Large-Scale Dictionary Construction for Foreign Language Tutoring and Interlingual Machine Translation , 1998, Machine Translation.

[28]  Neville Ryant,et al.  Extending VerbNet with Novel Verb Classes , 2006, LREC.

[29]  Bernd Fritzke,et al.  A Growing Neural Gas Network Learns Topologies , 1994, NIPS.

[30]  Stephen E. Robertson,et al.  Relevance weighting of search terms , 1976, J. Am. Soc. Inf. Sci..

[31]  Mats Rooth,et al.  Using a Probabilistic Class-Based Lexicon for Lexical Ambiguity Resolution , 2000, COLING.

[32]  Chris Brew,et al.  Spectral Clustering for German Verbs , 2002, EMNLP.

[33]  Jean-Charles Lamirel,et al.  A New Feature Selection and Feature Contrasting Approach Based on Quality Metric: Application to Efficient Classification of Complex Textual Data , 2013, PAKDD Workshops.

[34]  Thierry Poibeau,et al.  Multi-way Tensor Factorization for Unsupervised Lexical Acquisition , 2012, COLING.