Spell Sequences, State Proximities, and Distance Metrics

Because optimal matching (OM) distance is not very sensitive to differences in the order of states, we introduce a subsequence-based distance measure that can be adapted to subsequence length, to subsequence duration, and to soft-matching of states. Using a simulation technique developed by Studer, we investigate the sensitivity, relative to OM, of several variants of this metric to variations in order, timing, and duration of states. The results show that the behavior of the metric is as intended. Furthermore, we use family formation data from the Swiss Household Panel to compare a few variants of the new metric to OM. The new metrics have been implemented in the freely available TraMineR-package.

[1]  Anthony Widjaja,et al.  Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond , 2003, IEEE Transactions on Neural Networks.

[2]  Alfred Inselberg,et al.  Parallel Coordinates: Visual Multidimensional Geometry and Its Applications , 2003, KDIR.

[3]  David G. Stork,et al.  Pattern Classification , 1973 .

[4]  A. Abbott,et al.  Measuring Resemblance in Sequence Data: An Optimal Matching Analysis of Musicians' Careers , 1990, American Journal of Sociology.

[5]  Silke Aisenbrey,et al.  New Life for Old Ideas: The "Second Wave" of Sequence Analysis Bringing the "Course" Back Into the Life Course , 2010 .

[6]  Irma Mooi-Reci Retrenchments in Unemployment Insurance Benefits and Wage Inequality: Longitudinal Evidence from the Netherlands, 1985–2000 , 2012 .

[7]  Stefanie Seiler,et al.  Finding Groups In Data , 2016 .

[8]  L. Hubert,et al.  A general statistical framework for assessing categorical clustering in free recall. , 1976 .

[9]  Gilbert Ritschard,et al.  Discrepancy Analysis of State Sequences , 2011 .

[10]  Lawrence L. Wu Some Comments on “Sequence Analysis and Optimal Matching Methods in Sociology: Review and Prospect” , 2000 .

[11]  Matissa N. Hollister,et al.  Is Optimal Matching Suboptimal? , 2009 .

[12]  Zhiwei Lin,et al.  Concordance and consensus , 2011, Inf. Sci..

[13]  Catherine Pollak,et al.  Analyse des parcours de pauvreté : l'apport des enquêtes longitudinales , 2009 .

[14]  Karl Ulrich Mayer,et al.  The Measurement of Age, Age Structuring, and the Life Course , 1997 .

[15]  Laurent Lesnard,et al.  Setting Cost in Optimal Matching to Uncover Contemporaneous Socio-Temporal Patterns , 2010 .

[16]  Matthias Studer,et al.  Étude des inégalités de genre en début de carrière académique à l'aide de méthodes innovatrices d'analyse de données séquentielles , 2012 .

[17]  Laurent Lesnard,et al.  Off‐Scheduling within Dual‐Earner Couples: An Unequal and Negative Externality for Family Time1 , 2007, American Journal of Sociology.

[18]  Maxime Crochemore,et al.  Algorithms on strings , 2007 .

[19]  Inderjit S. Dhillon,et al.  Clustering with Bregman Divergences , 2005, J. Mach. Learn. Res..

[20]  Hui Wang,et al.  Versatile string kernels , 2013, Theor. Comput. Sci..

[21]  Ulrich Kohler,et al.  Sequence Analysis with Stata , 2006 .

[22]  P Martin,et al.  Optimal Matching Analysis , 2011 .

[23]  Joseph B. Kruskal,et al.  Time Warps, String Edits, and Macromolecules , 1999 .

[24]  Joel Levine But What Have You Done for Us Lately? , 2000 .

[25]  Simon Levin Computational Molecular Biology An Introduction , 2000 .

[26]  Cees H. Elzinga,et al.  Distance, Similarity and Sequence Comparison , 2014 .

[27]  Shigeo Abe DrEng Pattern Classification , 2001, Springer London.

[28]  Gilbert Ritschard,et al.  Analyzing and Visualizing State Sequences in R with TraMineR , 2011 .

[29]  Martin Emms,et al.  On Order Equivalences between Distance and Similarity Measures on Sequences and Trees , 2012, ICPRAM.

[30]  G. McLachlan Discriminant Analysis and Statistical Pattern Recognition , 1992 .

[31]  Carl D. Meyer,et al.  Matrix Analysis and Applied Linear Algebra , 2000 .

[32]  Cees H. Elzinga,et al.  Sequence Similarity , 2003 .

[33]  Philipp Bucher,et al.  How Much Does It Cost? , 2009 .

[34]  Gilbert Ritschard,et al.  Extracting Knowledge from Life Courses: Clustering and Visualization , 2008, DaWaK.

[35]  Sven Rahmann,et al.  Algorithms for subsequence combinatorics , 2008, Theor. Comput. Sci..

[36]  Alberto Apostolico,et al.  The subsequence composition of a string , 2009, Theor. Comput. Sci..

[37]  Matthias Studer,et al.  WeightedCluster Library Manual A practical guide to creating typologies of trajectories in the social sciences with R , 2013 .

[38]  Gary Pollock,et al.  Holistic trajectories: a study of combined employment, housing and family careers by using multiple‐sequence analysis , 2007 .

[39]  Philipp Bucher,et al.  1. Multichannel Sequence Analysis Applied to Social Science Data , 2010, Sociological Methodology.

[40]  Alexander J. Smola,et al.  Learning with kernels , 1998 .

[41]  Brendan Halpin,et al.  Optimal Matching Analysis and Life-Course Data: The Importance of Duration , 2010 .

[42]  J. Gower,et al.  Metric and Euclidean properties of dissimilarity coefficients , 1986 .

[43]  A. Tversky Features of Similarity , 1977 .

[44]  Raffaella Piccarreta,et al.  Parametric and Nonparametric Analysis of Life Courses: An Application to Family Formation Patterns , 2012, Demography.

[45]  C. H. Oh,et al.  Some comments on , 1998 .

[46]  J. Gower A General Coefficient of Similarity and Some of Its Properties , 1971 .

[47]  T. Taris,et al.  Measuring the Agreement between Sequences , 1995 .

[48]  Francesco C. Billari,et al.  Leaving home in Europe: the experience of cohorts born around 1960 , 2001 .

[49]  Pirjo Moen,et al.  Attribute, Event Sequence, and Event Type Similarity Notions for Data Mining , 2000 .

[50]  David G. Stork,et al.  Pattern Classification (2nd ed.) , 1999 .

[51]  Cees H. Elzinga,et al.  Combinatorial Representations of Token Sequences , 2005, J. Classif..

[52]  Aart C. Liefbroer,et al.  De-standardization of Family-Life Trajectories of Young Adults: A Cross-National Comparison Using Sequence Analysis , 2007 .

[53]  Ali S. Hadi,et al.  Finding Groups in Data: An Introduction to Chster Analysis , 1991 .

[54]  Patrick Rousset,et al.  Classifying Qualitative Time Series with SOM: The Typology of Career Paths in France , 2007, IWANN.

[55]  Geoffrey J. McLachlan,et al.  Discriminant Analysis and Statistical Pattern Recognition: McLachlan/Discriminant Analysis & Pattern Recog , 2005 .

[56]  Michael Anyadike-Danes,et al.  Predicting successful and unsuccessful transitions from school to work by using sequence methods , 2002 .

[57]  Hui Wang,et al.  Nearest neighbors by neighborhood counting , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[58]  Gilbert Ritschard,et al.  A Decorated Parallel Coordinate Plot for Categorical Longitudinal Data , 2014 .

[59]  Laurence Thomsin,et al.  Genre et étapes du passage à la vie adulte en Suisse , 2004 .

[60]  Bin Ma,et al.  On the similarity metric and the distance metric , 2009, Theor. Comput. Sci..