Divisive Property-Based and Fuzzy Clustering for Sequence Analysis

This paper discusses the usefulness of divisive property-based and fuzzy clustering for sequence analysis. Divisive property-based clustering provides well-defined clustering membership rules. Aside from significantly simplifying interpretations of clustering, it is also useful when one plans to use the same typology in other samples or studies. We further enrich the methods by proposing new sets of sequence features that can be automatically extracted and used in the procedure. We then discuss the use of fuzzy clustering, where sequences belong to each cluster with an estimated membership strength. This approach is particularly useful when some sequences are thought to lie between two (or more) sequence types (i.e., hybrid-type sequences) or when only a weak structure is found in the data. This paper also discusses several methods by which to visualize a fuzzy clustering solution, and analyzes them with regression-like approaches. It also introduces R code to run all the discussed analyses; all the proposed developments are made available in the WeightedCluster R package.

[1]  Christine Thomas-Agnan,et al.  A tour of regression models for explaining shares , 2016 .

[2]  S. Ferrari,et al.  Beta Regression for Modelling Rates and Proportions , 2004 .

[3]  M. Seto,et al.  Housing Trajectories of Forensic Psychiatric Patients. , 2016, Behavioral sciences & the law.

[4]  Yves Lechevallier,et al.  DIVCLUS-T: A monothetic divisive hierarchical clustering method , 2007, Comput. Stat. Data Anal..

[5]  Mohammed J. Zaki,et al.  SPADE: An Efficient Algorithm for Mining Frequent Sequences , 2004, Machine Learning.

[6]  Ramakrishnan Srikant,et al.  Mining sequential patterns , 1995, Proceedings of the Eleventh International Conference on Data Engineering.

[7]  V. Pawlowsky-Glahn,et al.  Compositional data analysis : theory and applications , 2011 .

[8]  Ali S. Hadi,et al.  Finding Groups in Data: An Introduction to Chster Analysis , 1991 .

[9]  Matthias Studer,et al.  WeightedCluster Library Manual A practical guide to creating typologies of trajectories in the social sciences with R , 2013 .

[10]  A. Zeileis,et al.  Beta Regression in R , 2010 .

[11]  Yves Lechevallier,et al.  Empirical Comparison of a Monothetic Divisive Clustering Method with the Ward and the k-means Clustering Methods , 2006, Data Science and Classification.

[12]  A. Abbott,et al.  Sequence Analysis and Optimal Matching Methods in Sociology , 2000 .

[13]  Gilbert Ritschard,et al.  What matters in differences between life trajectories: a comparative review of sequence dissimilarity measures , 2016 .

[14]  Raffaella Piccarreta,et al.  Clustering work and family trajectories by using a divisive algorithm , 2007 .

[15]  Thomas Collas Multiphase Sequence Analysis , 2018 .

[16]  Gilbert Ritschard,et al.  Classer, discriminer et visualiser des séquences d'événements , 2010, EGC.

[17]  John R Warren,et al.  Do Different Methods for Modeling Age-Graded Trajectories Yield Consistent and Valid Results?1 , 2015, American Journal of Sociology.

[18]  Ivano Bison,et al.  From 07.00 to 22.00: A Dual-Earner Couple’s Typical Day in Italy , 2018 .

[19]  Frank Klawonn,et al.  Fuzzy clustering: More than just fuzzification , 2015, Fuzzy Sets Syst..

[20]  Gilbert Ritschard,et al.  Discrepancy Analysis of State Sequences , 2011 .

[21]  D. McVicar,et al.  Predicting Successful and Unsuccessful Transitions from School to Work Using Sequence Methods August 2000 , 2000 .

[22]  Matthias Studer,et al.  Spell Sequences, State Proximities, and Distance Metrics , 2015 .

[23]  M. Maier DirichletReg: Dirichlet Regression for Compositional Data in R , 2014 .

[24]  Gilbert Ritschard,et al.  Extracting and Rendering Representative Sequences , 2009, IC3K.