Harpoon or Bait? A Comparison of Various Metrics in Fishing for Sequence Patterns

The use of sequence analysis in the social sciences has significantly increased during the last decade or two. Sequence analysis explores and describes trajectories and “fishes for patterns” (Abbott, 2000). Many dissimilarity metrics exist in various domains (bioinformatics, data mining, etc.); therefore a crucial and pervasive issue in papers using sequence analysis is robustness. To what extent do the various techniques lead to consistent and converging results? What kinds of patterns are more easily fished out by each of the metrics? Here we propose a systematic comparison of about ten metrics that have been used in the social science literature, based on the examination of dissimilarity matrices computed from a simulated sequence data set including various patterns that sociologists can try to identify. This should help scholars in picking the method best suited to their data design and inquiry objectives.

[1]  X. Bry,et al.  Exploring Explanatory Models: An Event History Application , 2004 .

[2]  B. L. Roux,et al.  Geometric Data Analysis: From Correspondence Analysis to Structured Data Analysis , 2004 .

[3]  Paul D. Allison,et al.  Event History Analysis : Regression for Longitudinal Event Data , 1984 .

[4]  Stefani Scherer,et al.  Early Career Patterns - A Comparison of Great Britain and West Germany , 2001 .

[5]  Brendan Halpin,et al.  Class careers as sequences : An optimal matching analysis of work-life histories , 1998 .

[6]  A. Abbott,et al.  Measuring Resemblance in Sequence Data: An Optimal Matching Analysis of Musicians' Careers , 1990, American Journal of Sociology.

[7]  Laurence L. George,et al.  The Statistical Analysis of Failure Time Data , 2003, Technometrics.

[8]  T. Chan,et al.  Optimal Matching Analysis: A Methodological Note on Studying Career Mobility , 1995 .

[9]  David R. Cox,et al.  Regression models and life tables (with discussion , 1972 .

[10]  Richard W. Hamming,et al.  Error detecting and error correcting codes , 1950 .

[11]  Anette Eva Fasang,et al.  Retirement: Institutional Pathways and Individual Trajectories in Britain and Germany , 2010 .

[12]  Shin-Kap Han,et al.  Clocking Out: Temporal Patterning of Retirement1 , 1999, American Journal of Sociology.

[13]  Laurent Lesnard,et al.  Décrire des données séquentielles en sciences sociales : panorama des méthodes existantes , 2009 .

[14]  Aart C. Liefbroer,et al.  De-standardization of Family-Life Trajectories of Young Adults: A Cross-National Comparison Using Sequence Analysis , 2007 .

[15]  Joel Levine But What Have You Done for Us Lately? , 2000 .

[16]  Leo Breiman,et al.  Statistical Modeling: The Two Cultures (with comments and a rejoinder by the author) , 2001 .

[17]  Raffaella Piccarreta,et al.  Strings of Adulthood: A Sequence Analysis of Young British Women’s Work-Family Trajectories , 2007 .

[18]  Laurent Lesnard,et al.  Introduction aux méthodes d'appariement optimal (Optimal Matching Analysis) , 2006 .

[19]  Nicolas Robette,et al.  Explorer et décrire les parcours de vie: les typologies de trajectoires , 2011 .

[20]  L. Lebart,et al.  Statistique exploratoire multidimensionnelle , 1995 .

[21]  Laurent Lesnard,et al.  Setting Cost in Optimal Matching to Uncover Contemporaneous Socio-Temporal Patterns , 2010 .

[22]  Frans Willekens,et al.  The life course: models and analysis , 1999 .

[23]  Daniel Courgeau,et al.  Event History Analysis in Demography , 1993 .

[24]  J. Giret,et al.  Typologies de parcours et dynamique longitudinale , 2012 .

[25]  J. Jackson Wiley Series in Probability and Mathematical Statistics , 2004 .

[26]  N. Anderson,et al.  The Life Course , 2004 .

[27]  Shane S. Sturrock,et al.  Time Warps, String Edits, and Macromolecules – The Theory and Practice of Sequence Comparison . David Sankoff and Joseph Kruskal. ISBN 1-57586-217-4. Price £13.95 (US$22·95). , 2000 .

[28]  Ivano Bison,et al.  OM Matters: The Interaction Effects between Indel and Substitution Costs , 2009 .

[29]  D. McVicar,et al.  My Brilliant Career: Characterizing the Early Labor Market Trajectories of British Women From Generation X , 2010 .

[30]  Peter G. M. van der Heijden,et al.  Multiple Correspondence Analysis as a Tool for Quantification or Classification of Career Data , 1997 .

[31]  Aart C. Liefbroer,et al.  Intergenerational transmission of behavioural patterns: How similar are parents’ and children's demographic trajectories? , 2012 .

[32]  Laurent Lesnard,et al.  Investigating scheduling of work: a two‐stage optimal matching analysis of workdays and workweeks , 2011 .

[33]  D. Courgeau,et al.  Nuptialité et agriculture , 1986 .

[34]  X. Bry,et al.  Exploring explanatory models , 2004 .

[35]  A. Abbott,et al.  Optimal Matching Methods for Historical Sequences , 1986 .

[36]  Matissa N. Hollister,et al.  Is Optimal Matching Suboptimal? , 2009 .

[37]  Brendan Halpin,et al.  Optimal Matching Analysis and Life-Course Data: The Importance of Duration , 2010 .

[38]  Karl Ulrich Mayer,et al.  The Measurement of Age, Age Structuring, and the Life Course , 1997 .

[39]  Matthias Studer,et al.  Étude des inégalités de genre en début de carrière académique à l'aide de méthodes innovatrices d'analyse de données séquentielles , 2012 .

[40]  Lawrence L. Wu Some Comments on “Sequence Analysis and Optimal Matching Methods in Sociology: Review and Prospect” , 2000 .

[41]  Peter G. M. van der Heijden,et al.  Correspondence analysis of longitudinal categorical data , 1987 .

[42]  N. Robette,et al.  Comparing Qualitative Harmonic Analysis and Optimal Matching: An Exploratory Study of Occupational Trajectories , 2008 .

[43]  Andrew Abbott Reply to Levine and Wu , 2000 .

[44]  Leo Breiman,et al.  Statistical Modeling: The Two Cultures (with comments and a rejoinder by the author) , 2001, Statistical Science.

[45]  F. Billari,et al.  Classifying life course trajectories: a comparison of latent class and sequence analysis , 2012 .

[46]  Francesco C. Billari,et al.  Sequence Analysis in Demographic Research , 2001 .

[47]  Vladimir I. Levenshtein,et al.  Binary codes capable of correcting deletions, insertions, and reversals , 1965 .

[48]  Deville Méthodes statistiques et numériques de l'analyse harmonique , 1974 .

[49]  Silke Aisenbrey,et al.  New Life for Old Ideas: The "Second Wave" of Sequence Analysis Bringing the "Course" Back Into the Life Course , 2010 .

[50]  Francesco C. Billari,et al.  Life course analysis: two (complementary) cultures? Some reflections with examples from the analysis of the transition to adulthood , 2005 .

[51]  Groupe de réflexion sur l'approche biographique Biographies d'enquêtes : bilan de 14 collectes biographiques , 2009 .

[52]  Michael Anyadike-Danes,et al.  Predicting successful and unsuccessful transitions from school to work by using sequence methods , 2002 .

[53]  Ronald R. Rindfuss,et al.  Disorder in the life course: how common and does it matter? , 1987 .

[54]  Mary Blair-Loy Career Patterns of Executive Women in Finance: An Optimal Matching Analysis1 , 1999, American Journal of Sociology.

[55]  T. Taris,et al.  Measuring the Agreement between Sequences , 1995 .

[56]  Aart C. Liefbroer,et al.  Standardization of pathways to adulthood? an analysis of Dutch cohorts born between 1850 and 1900 , 2010, Demography.

[57]  Karl Ulrich Mayer,et al.  Event History Analysis in Life Course Research , 1990 .

[58]  M. Savage,et al.  Ascription into Achievement: Models of Career Systems at Lloyds Bank, 1890-1970 , 1996, American Journal of Sociology.

[59]  J. Minnen,et al.  The Coming of the 24-hour Economy? , 2008 .

[60]  G. Ritschard,et al.  Life Course Data In Demography And Social Sciences: Statistical And Data-Mining Approaches ☆ , 2005 .

[61]  P. Allison Event History Analysis , 1984 .

[62]  Gilbert Ritschard,et al.  Analyzing and Visualizing State Sequences in R with TraMineR , 2011 .

[63]  D.,et al.  Regression Models and Life-Tables , 2022 .