Optimal Bayesian estimators for latent variable cluster models

In cluster analysis interest lies in probabilistically capturing partitions of individuals, items or observations into groups, such that those belonging to the same group share similar attributes or relational profiles. Bayesian posterior samples for the latent allocation variables can be effectively obtained in a wide range of clustering models, including finite mixtures, infinite mixtures, hidden Markov models and block models for networks. However, due to the categorical nature of the clustering variables and the lack of scalable algorithms, summary tools that can interpret such samples are not available. We adopt a Bayesian decision theoretical approach to define an optimality criterion for clusterings and propose a fast and context-independent greedy algorithm to find the best allocations. One important facet of our approach is that the optimal number of groups is automatically selected, thereby solving the clustering and the model-choice problems at the same time. We consider several loss functions to compare partitions and show that our approach can accommodate a wide range of cases. Finally, we illustrate our approach on both artificial and real datasets for three different clustering models: Gaussian mixtures, stochastic block models and latent block models for networks.

[1]  William M. Rand,et al.  Objective Criteria for the Evaluation of Clustering Methods , 1971 .

[2]  S. Morgan,et al.  The Correspondence Between Fertility Intentions and Behavior in the United States. , 2010, Population and development review.

[3]  P. Green,et al.  Bayesian Model-Based Clustering Procedures , 2007 .

[4]  Sylvia Richardson,et al.  Sampling from Dirichlet process mixture models with unknown concentration parameter: mixing issues in large data implementations , 2013, Statistics and Computing.

[5]  Gerard Delanty The Foundations of Social Theory , 2009 .

[6]  P. Latouche,et al.  Model selection and clustering in stochastic block models based on the exact integrated complete data likelihood , 2015 .

[7]  J. Bongaarts,et al.  Fertility and reproductive preferences in post-transitional societies , 1998 .

[8]  Mikko Myrskylä,et al.  A global perspective on happiness and fertility. , 2010, Population and development review.

[9]  A. Bowman,et al.  A look at some data on the old faithful geyser , 1990 .

[10]  Max A. Little,et al.  Simple approximate MAP inference for Dirichlet processes mixtures , 2016 .

[11]  이지혜,et al.  OECD Family database 한국 자료 구축 및 제공 , 2012 .

[12]  Arnold Michael Muller,et al.  Human Development Report 2006 , 2006 .

[13]  Melinda Mills,et al.  Gender equity and fertility intentions in Italy and the Netherlands , 2008 .

[14]  Francesco C. Billari,et al.  Attitudes, Norms and Perceived Behavioural Control: Explaining Fertility Intentions in Bulgaria , 2009 .

[15]  W. Miller,et al.  Personality traits and developmental experiences as antecedents of childbearing motivation , 1992, Demography.

[16]  P. McDonald Gender Equity in Theories of Fertility Transition , 2000 .

[17]  T. Snijders,et al.  Estimation and Prediction for Stochastic Blockstructures , 2001 .

[18]  Richard Williams Generalized Ordered Logit/Partial Proportional Odds Models for Ordinal Dependent Variables , 2006 .

[19]  D. Binder Bayesian cluster analysis , 1978 .

[20]  N. Namboodiri,et al.  Some observations on the economic framework for fertility analysis. , 1972, Population studies.

[21]  Leonardo Grilli,et al.  The Influence of Childbearing Regional Contexts on Ideal Family Size in Europe , 2006 .

[22]  Aristides Gionis,et al.  Clustering aggregation , 2005, 21st International Conference on Data Engineering (ICDE'05).

[23]  Hans-Peter Blossfeld,et al.  Human Capital Investments or Norms of Role Transition? How Women's Schooling and Career Affect the Process of Family Formation , 1991, American Journal of Sociology.

[24]  Amélie Quesnel-Vallée,et al.  Missing the Target? Correspondence of Fertility Intentions and Behavior in the U.S. , 2003 .

[25]  P. Deb Finite Mixture Models , 2008 .

[26]  Christophe Ambroise,et al.  Fast online graph clustering via Erdös-Rényi mixture , 2008, Pattern Recognit..

[27]  Joop J. Hox,et al.  How few countries will do? Comparative survey analysis from a Bayesian perspective , 2012 .

[28]  Satoshi Kanazawa,et al.  A theory of the value of children , 1994, Demography.

[29]  Camille Roth,et al.  Natural Scales in Geographical Patterns , 2017, Scientific Reports.

[30]  Ronald R. Rindfuss,et al.  Changing Relationships between Education and Fertility: A Study of Women and Men Born 1940 to 1964 , 2008 .

[31]  James Bailey,et al.  Information Theoretic Measures for Clusterings Comparison: Variants, Properties, Normalization and Correction for Chance , 2010, J. Mach. Learn. Res..

[32]  Alexia Prskawetz,et al.  Are Individuals’ Desired Family Sizes Stable? Evidence from West German Panel Data , 2008 .

[33]  J. Ermisch Purchased child care, optimal family size and mother's employment Theory and econometric analysis , 1988, Journal of population economics.

[34]  Nial Friel,et al.  Inferring structure in bipartite networks using the latent blockmodel and exact ICL , 2014, Network Science.

[35]  P. Green Reversible jump Markov chain Monte Carlo computation and Bayesian model determination , 1995 .

[36]  Tim B. Heaton,et al.  Persistence and change in decisions to remain childless , 1999 .

[37]  S. Gustafsson,et al.  Optimal age at motherhood. Theoretical and empirical considerations on postponement of maternity in Europe , 2001 .

[38]  J. Hartigan,et al.  Product Partition Models for Change Point Problems , 1992 .

[39]  Kazuo Yamaguchi,et al.  The stopping and spacing of childbirths and their birth-history predictors: rational-choice theory and event-history analysis. , 1995 .

[40]  I. Ajzen The theory of planned behavior , 1991 .

[41]  Joseph Lee Rodgers,et al.  Education, Fertility, and Heritability: Explaining a Paradox , 2003 .

[42]  Stan Lipovetsky,et al.  Generalized Latent Variable Modeling: Multilevel,Longitudinal, and Structural Equation Models , 2005, Technometrics.

[43]  P. McDonald Low Fertility and the State: The Efficacy of Policy , 2006 .

[44]  Joydeep Ghosh,et al.  Cluster Ensembles --- A Knowledge Reuse Framework for Combining Multiple Partitions , 2002, J. Mach. Learn. Res..

[45]  Zoubin Ghahramani,et al.  Bayesian Cluster Analysis: Point Estimation and Credible Balls (with Discussion) , 2015, Bayesian Analysis.

[46]  Thomas Brendan Murphy,et al.  Bayesian variable selection for latent class analysis using a collapsed Gibbs sampler , 2014, Statistics and Computing.

[47]  Roderick J. A. Little,et al.  Statistical Analysis with Missing Data , 1988 .

[48]  Gérard-François Dumont,et al.  Family Policy and Fertility in Europe , 2007 .

[49]  Max A. Little,et al.  Simple approximate MAP Inference for Dirichlet processes , 2014, 1411.0939.

[50]  James Bailey,et al.  Information theoretic measures for clusterings comparison: is a correction for chance necessary? , 2009, ICML '09.

[51]  C. Robert,et al.  Bayesian inference in hidden Markov models through the reversible jump Markov chain Monte Carlo method , 2000 .

[52]  Nico Keilman,et al.  Childbearing impeded education more than education impeded childbearing among Norwegian women , 2011, Proceedings of the National Academy of Sciences.

[53]  Daniel Stegmueller,et al.  How Many Countries for Multilevel Modeling? A Comparison of Frequentist and Bayesian Approaches , 2013 .

[54]  Aart C. Liefbroer,et al.  Changes in Family Size Intentions Across Young Adulthood: A Life-Course Perspective , 2008, European journal of population = Revue europeenne de demographie.

[55]  K. Ickstadt,et al.  Improved criteria for clustering based on the posterior similarity matrix , 2009 .

[56]  S. Hayford,et al.  The evolution of fertility expectations over the life course , 2009, Demography.

[57]  S. Morgan,et al.  Parity-specific fertility intentions and uncertainty: the United States, 1970 to 1976 , 2011, Demography.

[58]  Robert Schoen,et al.  Do fertility intentions affect fertility behavior , 1999 .

[59]  Nial Friel,et al.  Choosing the number of clusters in a finite mixture model using an exact integrated completed likelihood criterion , 2014, METRON.

[60]  Lain L. MacDonald,et al.  Hidden Markov and Other Models for Discrete- valued Time Series , 1997 .

[61]  M. Corijn,et al.  Who, What, Where, and When? Specifying the Impact of Educational Attainment and Labour Force Participation on Family Formation , 1999, European journal of population = Revue europeenne de demographie.

[62]  Francesco C. Billari,et al.  Education and the Transition to Motherhood : a Comparative Analysis of Western Europe , 2004 .

[63]  Yang Yang,et al.  How big are educational and racial fertility differentials in the U.S.? , 2003, Social biology.

[64]  P. Green,et al.  On Bayesian Analysis of Mixtures with an Unknown Number of Components (with discussion) , 1997 .

[65]  Nial Friel,et al.  Block clustering with collapsed latent block models , 2010, Statistics and Computing.

[66]  W. Miller,et al.  Childbearing motivations, desires, and intentions: a theoretical framework. , 1994, Genetic, social, and general psychology monographs.

[67]  Maria Iacovou,et al.  Yearning, Learning and Conceding: (Some of) the Reasons People Change Their Childbearing Intentions , 2010 .

[68]  M. Escobar,et al.  Markov Chain Sampling Methods for Dirichlet Process Mixture Models , 2000 .

[69]  J. Barber,et al.  Ideational influences on the transition to parenthood : Attitudes toward childbearing and competing alternatives , 2001 .

[70]  D. B. Dahl Modal clustering in a class of product partition models , 2009 .

[71]  Marina Meila,et al.  Local equivalences of distances between clusterings—a geometric perspective , 2012, Machine Learning.

[72]  Y. Teh,et al.  MCMC for Normalized Random Measure Mixture Models , 2013, 1310.0595.

[73]  F. Quintana A predictive view of Bayesian clustering , 2006 .

[74]  P. Latouche,et al.  Overlapping stochastic block models with application to the French political blogosphere , 2009, 0910.2098.

[75]  Nial Friel,et al.  Bayesian model selection for the latent position cluster model for social networks , 2013, Network Science.

[76]  Agostino Nobile,et al.  Bayesian finite mixtures with an unknown number of components: The allocation sampler , 2007, Stat. Comput..

[77]  O. Thévenon,et al.  Family policies in OECD countries: a comparative analysis. , 2011, Population and development review.

[78]  Joshua R. Goldstein,et al.  New Cohort Fertility Forecasts for the Developed World: Rises, Falls, and Reversals , 2013 .

[79]  M E J Newman,et al.  Fast algorithm for detecting community structure in networks. , 2003, Physical review. E, Statistical, nonlinear, and soft matter physics.

[80]  T. Ferguson A Bayesian Analysis of Some Nonparametric Problems , 1973 .

[81]  S. Morgan,et al.  Intention and uncertainty at later stages of childbearing: the united states 1965 and 1970 , 2010, Demography.

[82]  M. Stephens Dealing with label switching in mixture models , 2000 .

[83]  P. McCullagh Partition models , 2015 .

[84]  S. Hoffman A Treatise on the Family , 2000 .

[85]  Margherita Fort,et al.  More Schooling, More Children: Compulsory Schooling Reforms and Fertility in Europe , 2011, SSRN Electronic Journal.

[86]  M. P. Ward,et al.  The Emergence of Countercyclical U.S. Fertility , 1979 .

[87]  J. Besag On the Statistical Analysis of Dirty Pictures , 1986 .

[88]  Ka Yee Yeung,et al.  Bayesian mixture model based clustering of replicated microarray data , 2004, Bioinform..

[89]  Liz Malm,et al.  On the Interaction between the Quantity and Quality of Children , 2012 .

[90]  V. Oppenheimer Womens rising employment and the future of the family in industrial societies. , 1994 .

[91]  Neil J. Hurley,et al.  Computational Statistics and Data Analysis , 2022 .

[92]  K. Roeder Density estimation with confidence sets exemplified by superclusters and voids in the galaxies , 1990 .

[93]  Radford M. Neal Markov Chain Sampling Methods for Dirichlet Process Mixture Models , 2000 .

[94]  Roel Bosker,et al.  Multilevel analysis : an introduction to basic and advanced multilevel modeling , 1999 .

[95]  P. Chiappori,et al.  A Theory of the Allocation of Time " , 2014 .

[96]  Francesco C. Billari,et al.  Advances in development reverse fertility declines , 2009, Nature.

[97]  M. Meilă Comparing clusterings---an information based distance , 2007 .

[98]  Nial Friel,et al.  An adaptive MCMC method for multiple changepoint analysis with applications to large datasets , 2016, 1606.09419.

[99]  R. Rindfuss,et al.  Those ubiquitous fertility trends: United States, 1945-1979. , 1983, Social biology.