Extraction de motifs dialogiques bidimensionnels

Cet article aborde le probleme de l'extraction semi-automatique de regularites dans des dialogues sous la forme de motifs dialogiques. Nous presentons un algorithme de program-mation dynamique en O(mA × n × mB) permettant d'extraire de maniere non supervisee des motifs recurrents de tableaux bidimensionnels d'annotations representant des dialogues (mA et mB correspondent au nombre de lignes des deux tableaux consideres et n a leur nombre de colonnes). Cet algorithme, combine a une methode de partitionnement, permet d'obtenir des regularites caracteristiques d'un corpus annote. Les parametres de la methode et les resultats obtenus (i.e., les motifs dialogiques et les partitions de motifs) sont evalues manuellement par un expert lors d'une experimentation. ABSTRACT. This article addresses the problem of semi-automatic regularity extraction in dialogues in the shape of dialogical patterns. We present a dynamic programming algorithm which runs in O(mA × n × mB) that enables to perform an unsupervised extraction of recurrent patterns from two-dimensional dialogue annotations (with mA and mB the number of lines in the two considered arrays and n their number of columns). We show how it can be combined to a clustering heuristique in order to extract relevant regularities from an annotated corpus. The parameters of the method and the results obtained (i.e., dialogical patterns and pattern clustering solutions) are evaluated manually by an expert through an experiment. MOTS-CLES : extraction de regularites, motifs dialogiques, modelisation du dialogue.

[1]  Jiawei Han,et al.  Frequent pattern mining: current status and future directions , 2007, Data Mining and Knowledge Discovery.

[2]  W. Pearson Rapid and sensitive sequence comparison with FASTP and FASTA. , 1990, Methods in enzymology.

[3]  Ralph Grishman,et al.  Unsupervised Discovery of Scenario-Level Patterns for Information Extraction , 2000, ANLP.

[4]  Sudipto Guha,et al.  ROCK: A Robust Clustering Algorithm for Categorical Attributes , 2000, Inf. Syst..

[5]  Ellen Riloff,et al.  Automatically Constructing a Dictionary for Information Extraction Tasks , 1993, AAAI.

[6]  Harry Bunt,et al.  Multifunctionality in dialogue , 2011, Comput. Speech Lang..

[7]  Sudipto Guha,et al.  ROCK: a robust clustering algorithm for categorical attributes , 1999, Proceedings 15th International Conference on Data Engineering (Cat. No.99CB36337).

[8]  Andrew Olney,et al.  Mining Collaborative Patterns in Tutorial Dialogues , 2010, EDM 2010.

[9]  David Fisher,et al.  CRYSTAL: Inducing a Conceptual Dictionary , 1995, IJCAI.

[10]  Ralph Grishman,et al.  Automatic Pattern Acquisition for Japanese Information Extraction , 2001, HLT.

[11]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[12]  Fadi J. Kurdahi,et al.  On clustering for maximal regularity extraction , 1993, IEEE Trans. Comput. Aided Des. Integr. Circuits Syst..

[13]  Alexandre Pauchet,et al.  Pattern discovery in annotated dialogues using dynamic programming , 2012, Int. J. Intell. Inf. Database Syst..

[14]  Julia Hirschberg,et al.  The Rules Behind Roles: Identifying Speaker Role in Radio Broadcasts , 2000, AAAI/IAAI.

[15]  Lawrence B. Holder,et al.  Substucture Discovery in the SUBDUE System , 1994, KDD Workshop.