A Markovian Kernel-based Approach for itaLIan Speech acT labEliNg

English. This paper describes the UNITOR system that participated to the itaLIan Speech acT labEliNg task within the context of EvalIta 2018. A Structured Kernel-based Support Vector Machine has been here applied to make the classification of the dialogue turns sensitive to the syntactic and semantic information of each utterance, without relying on any task-specific manual feature engineering. Moreover, a specific Markovian formulation of the SVM is adopted, so that the labeling of each utterance depends on speech acts assigned to the previous turns. The UNITOR system ranked first in the competition, suggesting that the combination of the adopted structured kernel and the Markovian modeling is beneficial. Italian. Questo lavoro descrive il sistema UNITOR che ha partecipato all’itaLIan Speech acT labEliNg task organizzato nell’ambito di EvalIta 2018. Il sistema è basato su una Structured Kernelbased Support Vector Machine (SVM) che rende la classificazione dei turni di dialogo dipendente dalle informazioni sintattiche e semantiche della frase, evitando la progettazione di alcuna feature specifica per il task. Una specifica formulazione Markoviana dell’algoritmo di apprendimento SVM permette inoltre di etichettare ciascun turno in funzione delle classificazioni dei turni precedenti. Il sistema UNITOR si é classificato al primo posto nella competizione, e questo conferma come la combinazione della funzione kernel e del modello Markoviano adottati sia molto utile allo sviluppo di sistemi di dialoghi robusti.

[1]  Gunnar Rätsch,et al.  An introduction to kernel-based learning algorithms , 2001, IEEE Trans. Neural Networks.

[2]  Roberto Basili,et al.  Structured Lexical Similarity via Convolution Kernels on Dependency Trees , 2011, EMNLP.

[3]  Michael Collins,et al.  Convolution Kernels for Natural Language , 2001, NIPS.

[4]  Nello Cristianini,et al.  Kernel Methods for Pattern Analysis , 2004 .

[5]  Nicole Novielli,et al.  Overview of the Evalita 2018 itaLIan Speech acT labEliNg (iLISTEN) Task , 2018, EVALITA@CLiC-it.

[6]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[7]  David Traum,et al.  Speech Acts for Dialogue Agents , 1999 .

[8]  Alessandro Moschitti,et al.  Efficient Convolution Kernels for Dependency and Constituent Syntactic Trees , 2006, ECML.

[9]  Thomas Hofmann,et al.  Hidden Markov Support Vector Machines , 2003, ICML.

[10]  Roberto Basili,et al.  Verb Classification using Distributional Similarity in Syntactic and Semantic Structures , 2012, ACL.

[11]  Roberto Basili,et al.  KeLP: a Kernel-based Learning Platform for Natural Language Processing , 2015, ACL.

[12]  Roberto Basili,et al.  KELP: a Kernel-based Learning Platform , 2018, J. Mach. Learn. Res..

[13]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[14]  Magnus Sahlgren,et al.  The Word-Space Model: using distributional analysis to represent syntagmatic and paradigmatic relations between words in high-dimensional vector spaces , 2006 .