Extraction et partitionnement pour la recherche de régularités : application à l'analyse de dialogues. (Extraction and clustering for regularities identification : application to dialogues analysis)

Dans le cadre de l’aide a l’analyse de dialogues, un corpus de dialogues peut etre represente par un ensemble de tableaux d’annotations encodant les differents enonces des dialogues. Afin d’identifier des schemas dialogiques mis en oeuvre frequemment, nous definissons une methodologie en deux etapes : extraction de motifs recurrents, puis partitionnement de ces motifs en classes homogenes constituant ces regularites. Deux methodes sont developpees afin de realiser l’extraction de motifs recurrents : LPCADC et SABRE. La premiere est une adaptation d’un algorithme de programmation dynamique tandis que la seconde est issue d’une modelisation formelle du probleme d’extraction d’alignements locaux dans un couple de tableaux d’annotations.Le partitionnement de motifs recurrents est realise par diverses heuristiques de la litterature ainsi que deux formulations originales du probleme de K-partitionnement sous la forme de programmes lineaires en nombres entiers. Lors d’une etude polyedrale, nous caracterisons des facettes d’un polyedre associe a ces formulations (notamment les inegalites de 2-partitions, les inegalites 2-chorded cycles et les inegalites de clique generalisees). Ces resultats theoriques permettent la mise en place d’un algorithme de plans coupants resolvant efficacement le probleme.Nous developpons le logiciel d’aide a la decision VIESA, mettant en oeuvre ces differentes methodes et permettant leur evaluation au cours de deux experimentations realisees par un expert psychologue. Des regularites correspondant a des strategies dialogiques que des extractions manuelles n’avaient pas permis d’obtenir sont ainsi identifiees.

[1]  Jiong Yang,et al.  STING: A Statistical Information Grid Approach to Spatial Data Mining , 1997, VLDB.

[2]  Martine Labbé,et al.  Size-constrained graph partitioning polytopes , 2010, Discret. Math..

[3]  Andrew Olney,et al.  Mining Collaborative Patterns in Tutorial Dialogues , 2010, EDM 2010.

[4]  M. R. Rao,et al.  Facets of the K-partition Polytope , 1995, Discret. Appl. Math..

[5]  Antonio Sassano,et al.  The equipartition polytope. II: Valid inequalities and facets , 1990, Math. Program..

[6]  Ramakrishnan Srikant,et al.  Mining sequential patterns , 1995, Proceedings of the Eleventh International Conference on Data Engineering.

[7]  Takashi Washio,et al.  State of the art of graph-based data mining , 2003, SKDD.

[8]  Hans-Peter Kriegel,et al.  A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise , 1996, KDD.

[9]  George Karypis,et al.  SLPMiner: an algorithm for finding frequent sequential patterns using length-decreasing support constraint , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..

[10]  K. Florek,et al.  Sur la liaison et la division des points d'un ensemble fini , 1951 .

[11]  Delbert Dueck,et al.  Clustering by Passing Messages Between Data Points , 2007, Science.

[12]  Mechthild Stoer,et al.  A simple min-cut algorithm , 1997, JACM.

[13]  John F. Roddick,et al.  Sequential pattern mining -- approaches and algorithms , 2013, CSUR.

[14]  Joost N. Kok,et al.  A quickstart in frequent structure mining can make a difference , 2004, KDD.

[15]  Andreas Holzinger,et al.  Data Mining with Decision Trees: Theory and Applications , 2015, Online Inf. Rev..

[16]  Frits C. R. Spieksma,et al.  The clique partitioning problem: Facets and patching facets , 2001, Networks.

[17]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[18]  David S. Johnson,et al.  Some Simplified NP-Complete Graph Problems , 1976, Theor. Comput. Sci..

[19]  Narendra Karmarkar,et al.  A new polynomial-time algorithm for linear programming , 1984, STOC '84.

[20]  Yi-Chung Hu,et al.  Deriving two-stage learning sequences from knowledge in fuzzy sequential pattern mining , 2004, Inf. Sci..

[21]  Vladimir I. Levenshtein,et al.  Binary codes capable of correcting deletions, insertions, and reversals , 1965 .

[22]  Alessandro Vinciarelli,et al.  Speakers Role Recognition in Multiparty Audio Recordings Using Social Network Analysis and Duration Distribution Modeling , 2007, IEEE Transactions on Multimedia.

[23]  Pavel Berkhin,et al.  A Survey of Clustering Data Mining Techniques , 2006, Grouping Multidimensional Data.

[24]  Lawrence B. Holder,et al.  Subdue: compression-based frequent pattern discovery in graph data , 2005 .

[25]  Brian W. Kernighan,et al.  An Effective Heuristic Algorithm for the Traveling-Salesman Problem , 1973, Oper. Res..

[26]  Ravindra K. Ahuja,et al.  Network Flows: Theory, Algorithms, and Applications , 1993 .

[27]  George Karypis,et al.  Frequent subgraph discovery , 2001, Proceedings 2001 IEEE International Conference on Data Mining.

[28]  Anil K. Jain Data clustering: 50 years beyond K-means , 2008, Pattern Recognit. Lett..

[29]  T. Sørensen,et al.  A method of establishing group of equal amplitude in plant sociobiology based on similarity of species content and its application to analyses of the vegetation on Danish commons , 1948 .

[30]  Jianhong Wu,et al.  Data clustering - theory, algorithms, and applications , 2007 .

[31]  Nizar R. Mabroukeh,et al.  A taxonomy of sequential pattern mining algorithms , 2010, CSUR.

[32]  Ali Ridha Mahjoub,et al.  On the cut polytope , 1986, Math. Program..

[33]  Florent Masseglia,et al.  The PSP Approach for Mining Sequential Patterns , 1998, PKDD.

[34]  Ellen Riloff,et al.  Learning Dictionaries for Information Extraction by Multi-Level Bootstrapping , 1999, AAAI/IAAI.

[35]  Randall W. Hill,et al.  Toward Virtual Humans , 2006, AI Mag..

[36]  Frits C. R. Spieksma,et al.  The facial structure of the clique partitioning polytope , 1995 .

[37]  Fadi J. Kurdahi,et al.  On clustering for maximal regularity extraction , 1993, IEEE Trans. Comput. Aided Des. Integr. Circuits Syst..

[38]  Frits C. R. Spieksma,et al.  Lifting theorems and facet characterization for a class of clique partitioning inequalities , 1999, Oper. Res. Lett..

[39]  Michel Minoux,et al.  On the Solution of a Graph Partitioning Problem under Capacity Constraints , 2012, ISCO.

[40]  Salvatore Orlando,et al.  A new algorithm for gap constrained sequence mining , 2004, SAC '04.

[41]  Martin Grötschel,et al.  Facets of the clique partitioning polytope , 1990, Math. Program..

[42]  Gonzalo Navarro,et al.  Fast Two-Dimensional Approximate Pattern Matching , 1999, LATIN.

[43]  John A. Nelder,et al.  A Simplex Method for Function Minimization , 1965, Comput. J..

[44]  Dorit S. Hochbaum,et al.  Polynomial algorithm for the k-cut problem , 1988, [Proceedings 1988] 29th Annual Symposium on Foundations of Computer Science.

[45]  Antoine Dutot,et al.  GraphStream: A Tool for bridging the gap between Complex Systems and Dynamic Graphs , 2008, ArXiv.

[46]  Ralph Grishman,et al.  Unsupervised Discovery of Scenario-Level Patterns for Information Extraction , 2000, ANLP.

[47]  Vipin Kumar,et al.  Chameleon: Hierarchical Clustering Using Dynamic Modeling , 1999, Computer.

[48]  Jiawei Han,et al.  Frequent pattern mining: current status and future directions , 2007, Data Mining and Knowledge Discovery.

[49]  Martin Grötschel,et al.  Clique-Web Facets for Multicut Polytopes , 1992, Math. Oper. Res..

[50]  Sudipto Guha,et al.  ROCK: a robust clustering algorithm for categorical attributes , 1999, Proceedings 15th International Conference on Data Engineering (Cat. No.99CB36337).

[51]  Victor A. Campos,et al.  On the asymmetric representatives formulation for the vertex coloring problem , 2005, Discret. Appl. Math..

[52]  Johannes Gehrke,et al.  Sequential PAttern mining using a bitmap representation , 2002, KDD.

[53]  Mohammed J. Zaki Sequence mining in categorical domains: incorporating constraints , 2000, CIKM '00.

[54]  Julia Hirschberg,et al.  The Rules Behind Roles: Identifying Speaker Role in Radio Broadcasts , 2000, AAAI/IAAI.

[55]  Ramakrishnan Srikant,et al.  Mining Sequential Patterns: Generalizations and Performance Improvements , 1996, EDBT.

[56]  George Karypis,et al.  GREW - a scalable frequent subgraph discovery algorithm , 2004, Fourth IEEE International Conference on Data Mining (ICDM'04).

[57]  B. Jaumard,et al.  Cluster Analysis and Mathematical Programming , 2003 .

[58]  Jian Pei,et al.  Mining frequent patterns without candidate generation , 2000, SIGMOD '00.

[59]  Brian W. Kernighan,et al.  An efficient heuristic procedure for partitioning graphs , 1970, Bell Syst. Tech. J..

[60]  Martine Labbé,et al.  A branch-and-cut algorithm for the partitioning-hub location-routing problem , 2011, Comput. Oper. Res..

[61]  D. Lipman,et al.  Rapid and sensitive protein similarity searches. , 1985, Science.

[62]  David S. Johnson,et al.  Computers and Intractability: A Guide to the Theory of NP-Completeness , 1978 .

[63]  Samir Khuller,et al.  On Finding Dense Subgraphs , 2009, ICALP.

[64]  Thierry Lecroq,et al.  The exact online string matching problem: A review of the most recent results , 2013, CSUR.

[65]  Anil K. Jain,et al.  Algorithms for Clustering Data , 1988 .

[66]  Zhenglu Yang,et al.  LAPIN-SPAM: An Improved Algorithm for Mining Sequential Pattern , 2005, 21st International Conference on Data Engineering Workshops (ICDEW'05).

[67]  A. Chowdhary,et al.  A general approach for regularity extraction in datapath circuits , 1998, 1998 IEEE/ACM International Conference on Computer-Aided Design. Digest of Technical Papers (IEEE Cat. No.98CB36287).

[68]  GunopulosDimitrios,et al.  Automatic subspace clustering of high dimensional data for data mining applications , 1998 .

[69]  Kamala Krithivasan,et al.  Efficient two-dimensional pattern matching in the presence of errors , 1987, Inf. Sci..

[70]  Laurence A. Wolsey,et al.  The node capacitated graph partitioning problem: A computational study , 1998, Math. Program..

[71]  Philip Laird,et al.  Identifying and Using Patterns in Sequential Data , 1993, ALT.

[72]  Srinivasan Parthasarathy,et al.  Discovering frequent topological structures from graph datasets , 2005, KDD '05.

[73]  Moustafa Ghanem,et al.  String Mining in Bioinformatics , 2010, Scientific Data Mining and Knowledge Discovery.

[74]  Andrew B. Kahng,et al.  Fast spectral methods for ratio cut partitioning and clustering , 1991, 1991 IEEE International Conference on Computer-Aided Design Digest of Technical Papers.

[75]  Jian Pei,et al.  ApproxMAP: Approximate Mining of Consensus Sequential Patterns , 2003, SDM.

[76]  E. Barnes An algorithm for partitioning the nodes of a graph , 1981, 1981 20th IEEE Conference on Decision and Control including the Symposium on Adaptive Processes.

[77]  John E. Mitchell,et al.  Branch-and-price-and-cut on the clique partitioning problem with minimum clique size requirement , 2007, Discret. Optim..

[78]  Hiroshi Motoda,et al.  Graph-based induction as a unified learning framework , 1994, Applied Intelligence.

[79]  William W. Hager,et al.  An exact algorithm for graph partitioning , 2013, Math. Program..

[80]  Amihood Amir,et al.  Efficient 2-dimensional approximate matching of non-rectangular figures , 1991, SODA '91.

[81]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[82]  Rudolf Müller,et al.  On the partial order polytope of a digraph , 1996, Math. Program..

[83]  Rick Patrick Constantin Moritz,et al.  Une approche d'alignement à la problématique de la détection des activités habituelles. (Routine activity extraction from local alignments in mobile phone context data) , 2014 .

[84]  Christus,et al.  A General Method Applicable to the Search for Similarities in the Amino Acid Sequence of Two Proteins , 2022 .

[85]  Mihalis Yannakakis,et al.  The complexity of multiway cuts (extended abstract) , 1992, STOC '92.

[86]  Marc E. Pfetsch,et al.  Orbitopal Fixing , 2007, IPCO.

[87]  Toshihide Ibaraki,et al.  Greedy splitting algorithms for approximating multiway partition problems , 2005, Math. Program..

[88]  Michael A. Trick,et al.  Cliques and clustering: A combinatorial approach , 1998, Oper. Res. Lett..

[89]  Antonio Sassano,et al.  The equipartition polytope. I: Formulations, dimension and basic facets , 1990, Math. Program..

[90]  M. R. Rao,et al.  The partition problem , 1993, Math. Program..

[91]  Takashi Washio,et al.  An Apriori-Based Algorithm for Mining Frequent Substructures from Graph Data , 2000, PKDD.

[92]  Hans-Peter Kriegel,et al.  OPTICS: ordering points to identify the clustering structure , 1999, SIGMOD '99.

[93]  Mohammed J. Zaki Efficiently mining frequent trees in a forest , 2002, KDD.

[94]  Jiawei Han,et al.  Discovery of Multiple-Level Association Rules from Large Databases , 1995, VLDB.

[95]  Jiawei Han,et al.  gSpan: graph-based substructure pattern mining , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..

[96]  Tomasz Imielinski,et al.  Mining association rules between sets of items in large databases , 1993, SIGMOD Conference.

[97]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[98]  Giovanni Rinaldi,et al.  A branch-and-cut algorithm for the equicut problem , 1997, Math. Program..

[99]  R. Prim Shortest connection networks and some generalizations , 1957 .

[100]  A. Hoffman,et al.  Lower bounds for the partitioning of graphs , 1973 .

[101]  Guizhen Yang,et al.  The complexity of mining maximal frequent itemsets and maximal frequent patterns , 2004, KDD.

[102]  Michael Malmros Sørensen,et al.  b-Tree Facets for the Simple Graph Partitioning Polytope , 2004, J. Comb. Optim..

[103]  Umeshwar Dayal,et al.  FreeSpan: frequent pattern-projected sequential pattern mining , 2000, KDD '00.

[104]  Ralph Grishman,et al.  Automatic Pattern Acquisition for Japanese Information Extraction , 2001, HLT.

[105]  Peter C. Cheeseman,et al.  Bayesian Classification (AutoClass): Theory and Results , 1996, Advances in Knowledge Discovery and Data Mining.

[106]  J. Mitchell Realignment in the National Football League: Did they do it right? , 2003 .

[107]  Kyuseok Shim,et al.  SPIRIT: Sequential Pattern Mining with Regular Expression Constraints , 1999, VLDB.

[108]  A. Mehrabian,et al.  Decoding of inconsistent communications. , 1967, Journal of personality and social psychology.

[109]  Gwo-Hshiung Tzeng,et al.  A Fuzzy Data Mining Algorithm for Finding Sequential Patterns , 2003, Int. J. Uncertain. Fuzziness Knowl. Based Syst..

[110]  Franz Rendl,et al.  Graph partitioning using linear and semidefinite programming , 2003, Math. Program..

[111]  Qiming Chen,et al.  PrefixSpan,: mining sequential patterns efficiently by prefix-projected pattern growth , 2001, Proceedings 17th International Conference on Data Engineering.

[112]  Thomas J. Watson,et al.  An empirical study of the naive Bayes classifier , 2001 .

[113]  Mohammed J. Zaki,et al.  SPADE: An Efficient Algorithm for Mining Frequent Sequences , 2004, Machine Learning.

[114]  Panos M. Pardalos,et al.  Linear and quadratic programming approaches for the general graph partitioning problem , 2010, J. Glob. Optim..

[115]  Alberto Apostolico,et al.  Motif patterns in 2D , 2008, Theor. Comput. Sci..

[116]  Harry Bunt,et al.  The DIT++ taxanomy for functional dialogue markup , 2009 .

[117]  M. Grötschel,et al.  Composition of Facets of the Clique Partitioning Polytope , 1990 .

[118]  Michael M. Sørensen Facet-defining inequalities for the simple graph partitioning polytope , 2007, Discret. Optim..

[119]  M S Waterman,et al.  Identification of common molecular subsequences. , 1981, Journal of molecular biology.

[120]  Umeshwar Dayal,et al.  Multi-dimensional sequential pattern mining , 2001, CIKM '01.

[121]  J. Kruskal On the shortest spanning subtree of a graph and the traveling salesman problem , 1956 .

[122]  Pierre Feyereisen,et al.  The Meaning of Gestures - What Can Be Understood Without Speech , 1988 .

[123]  Chi Lap Yip,et al.  A GSP-based Efficient Algorithm for Mining Frequent Sequences , 2001 .

[124]  Laurence A. Wolsey,et al.  Formulations and valid inequalities for the node capacitated graph partitioning problem , 1996, Math. Program..

[125]  Céline Fiot,et al.  Why Fuzzy Sequential Patterns can Help Data Summarization: An Application to the INPI Trade Mark Database , 2006, 2006 IEEE International Conference on Fuzzy Systems.

[126]  Sudipto Guha,et al.  CURE: an efficient clustering algorithm for large databases , 1998, SIGMOD '98.

[127]  Yoshiko Wakabayashi,et al.  A cutting plane algorithm for a clustering problem , 1989, Math. Program..

[128]  Jitendra Malik,et al.  Normalized cuts and image segmentation , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[129]  Alexandre Pauchet,et al.  Pattern discovery in annotated dialogues using dynamic programming , 2012, Int. J. Intell. Inf. Database Syst..

[130]  George L. Nemhauser,et al.  Min-cut clustering , 1993, Math. Program..