The structure of verbal sequences analyzed with unsupervised learning techniques

Data mining allows the exploration of sequences of phenomena, whereas one usually tends to focus on isolated phenomena or on the relation between two phenomena. It offers invaluable tools for theoretical analyses and exploration of the structure of sentences, texts, dialogues, and speech. We report here the results of an attempt at using it for inspecting sequences of verbs from French accounts of road accidents. This analysis comes from an original approach of unsupervised training allowing the discovery of the structure of sequential data. The entries of the analyzer were only made of the verbs appearing in the sentences. It provided a classification of the links between two successive verbs into four distinct clusters, allowing thus text segmentation. We give here an interpretation of these clusters by comparing the statistical distribution of independent semantic annotations.

[1]  Z. Vendler Linguistics in Philosophy , 1967 .

[2]  Donald W. Bouldin,et al.  A Cluster Separation Measure , 1979, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[3]  H. Kamp,et al.  Tense in Texts , 1983 .

[4]  O. Ducrot,et al.  L'imparfait en français , 1979 .

[5]  M. Vuillaume Grammaire temporelle des récits , 1990 .

[6]  Younès Bennani,et al.  Apprentissage neuro-markovien pour la classification non supervisée de données structurées en séquences , 2007 .

[7]  Teuvo Kohonen,et al.  The self-organizing map , 1990, Neurocomputing.

[8]  Carlota S. Smith,et al.  The Parameter of Aspect , 1991 .

[9]  Zeno Vendler,et al.  Verbs and Times , 1957, The Language of Time - A Reader.

[10]  Young-Seuk Park,et al.  Self-Organizing Map , 2008 .

[11]  Alex Bateman,et al.  An introduction to hidden Markov models. , 2007, Current protocols in bioinformatics.

[12]  François Recanati,et al.  La classification de Vendler revue et corrigée , 1999 .

[13]  Paul J. Hopper,et al.  Some Observations on the Typology of Focus and Aspect in Narrative Language , 1979 .