Assessing the Number of Components in Mixture Models: a Review

Despite the widespread application of finite mixture models, the decision of how many classes are required to adequately represent the data is, according to many authors, an important, but unsolved issue. This work aims to review, describe and organize the available approaches designed to help the selection of the adequate number of mixture components (including Monte Carlo test procedures, information criteria and classification-based criteria); we also provide some published simulation results about their relative performance, with the purpose of identifying the scenarios where each criterion is more effective (adequate).

[1]  Adelaide Figueiredo,et al.  Welfare Regimes in the UE 15 and in the Enlarged Europe: An exploratory analysis , 2005 .

[2]  S. T. Buckland,et al.  An Introduction to the Bootstrap. , 1994 .

[3]  S. Newcomb A Generalized Theory of the Combination of Observations so as to Obtain the Best Result , 1886 .

[4]  D. M. Allen,et al.  Determining the number of components in mixtures of linear models , 2001 .

[5]  L. Wasserman,et al.  Computing Bayes Factors by Combining Simulation and Asymptotic Approximations , 1997 .

[6]  Aurora A.C. Teixeira,et al.  A Model of the Learning Process with Local Knowledge Externalities Illustrated with an Integrated Graphical Framework , 2009 .

[7]  Aurora Amlia Castro Teixeira,et al.  Measuring aggregate human capital in Portugal: 19602001 , 2005 .

[8]  Carlos F. Alves,et al.  Self-Interest on Mutual Fund Management: Evidence from the Portuguese Market , 2004 .

[9]  M. Kendall,et al.  The advanced theory of statistics , 1945 .

[10]  Pedro Mazeda Gil,et al.  Expected Profitability of Capital under Uncertainty – a Microeconomic Perspective , 2004 .

[11]  G. McLachlan On Bootstrapping the Likelihood Ratio Test Statistic for the Number of Components in a Normal Mixture , 1987 .

[12]  Filipe J. Sousa The Strategic Relevance of Business Relationships: A Preliminary Assessment , 2007 .

[13]  K. Roeder,et al.  Residual diagnostics for mixture models , 1992 .

[14]  T. Louis Finding the Observed Information Matrix When Using the EM Algorithm , 1982 .

[15]  Alvaro Aguiar,et al.  Testing for asymmetries in the preferences of the euro-area monetary policymaker , 2005 .

[16]  Hamparsum Bozdogan,et al.  Mixture-Model Cluster Analysis Using Model Selection Criteria and a New Informational Measure of Complexity , 1994 .

[17]  Christophe Biernacki Choix de modèles en classification , 1997 .

[18]  Murray Aitkin,et al.  Statistical Modelling of Data on Teaching Styles , 1981 .

[19]  H. Bozdogan Choosing the Number of Component Clusters in the Mixture-Model Using a New Informational Complexity Criterion of the Inverse-Fisher Information Matrix , 1993 .

[20]  M. Wedel,et al.  Market Segmentation: Conceptual and Methodological Foundations , 1997 .

[21]  J. Rissanen Stochastic Complexity and Modeling , 1986 .

[22]  Carlos F. Alves,et al.  The Informativeness of Quarterly Financial Reporting: The Portuguese Case , 2005 .

[23]  Gérard Govaert,et al.  An improvement of the NEC criterion for assessing the number of clusters in a mixture model , 1999, Pattern Recognit. Lett..

[24]  N. Sugiura Further analysts of the data by akaike' s information criterion and the finite corrections , 1978 .

[25]  António Almodovar,et al.  Is there any progress in Economics? Some answers from the historians of economic thought , 2005 .

[26]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[27]  B. Lindsay,et al.  The distribution of the likelihood ratio for mixtures of densities from the one-parameter exponential family , 1994 .

[28]  Hermano Rodrigues,et al.  Competitiveness and Public-Private Partnerships: Towards a More Decentralised Policy , 2004 .

[29]  Rick L. Andrews,et al.  Recovering and profiling the true segmentation structure in markets: an empirical investigation , 2003 .

[30]  Geoffrey J. McLachlan,et al.  Finite Mixture Models , 2019, Annual Review of Statistics and Its Application.

[31]  G. Celeux,et al.  An entropy criterion for assessing the number of clusters in a mixture model , 1996 .

[32]  van M.H. Emden,et al.  An analysis of complexity , 1971 .

[33]  W. DeSarbo,et al.  A Review of Recent Developments in Latent Class Regression Models , 1994 .

[34]  H. Bozdogan Model selection and Akaike's Information Criterion (AIC): The general theory and its analytical extensions , 1987 .

[35]  Clifford M. Hurvich,et al.  Regression and time series model selection in small samples , 1989 .

[36]  João Correia-da-Silva,et al.  Private Information: Similarity as Compatibility , 2007 .

[37]  H. Akaike INFORMATION THEORY AS AN EXTENSION OF THE MAXIMUM LIKELIHOOD , 1973 .

[38]  J. C. Dunn,et al.  A Fuzzy Relative of the ISODATA Process and Its Use in Detecting Compact Well-Separated Clusters , 1973 .

[39]  Sofia B. S. D. Castro,et al.  Past expectations as a determinant of present prices - hysteresis in a simple economy , 2002 .

[40]  C. Barbot Low cost carriers, secondary airports and State aid: an economic assessment of the Charleroi affair , 2004 .

[41]  R. A. Leibler,et al.  On Information and Sufficiency , 1951 .

[42]  A. Hope A Simplified Monte Carlo Significance Test Procedure , 1968 .

[43]  David R. Anderson,et al.  Model Selection and Multimodel Inference , 2003 .

[44]  Hirotugu Akaike,et al.  On entropy maximization principle , 1977 .

[45]  David R. Anderson,et al.  Modeling Survival and Testing Biological Hypotheses Using Marked Animals: A Unified Approach with Case Studies , 1992 .

[46]  G. Kitagawa,et al.  Bootstrapping Log Likelihood and EIC, an Extension of AIC , 1997 .

[47]  G. Schwarz Estimating the Dimension of a Model , 1978 .

[48]  W. DeSarbo,et al.  An Empirical Pooling Approach for Estimating Marketing Mix Elasticities with PIMS Data , 1993 .

[49]  R. Sundberg An iterative method for solution of the likelihood equations for incomplete data from exponential families , 1976 .

[50]  A. Raftery,et al.  Model-based Gaussian and non-Gaussian clustering , 1993 .

[51]  Gérard Govaert,et al.  Assessing a Mixture Model for Clustering with the Integrated Completed Likelihood , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[52]  Clifford M. Hurvich,et al.  Model selection for extended quasi-likelihood models in small samples. , 1995, Biometrics.

[53]  D. M. Titterington,et al.  On the deter-mination of the number of components in a mixture , 1998 .

[54]  R. Fisher The Advanced Theory of Statistics , 1943, Nature.

[55]  Mário Rui Silva,et al.  Public-Private Partnerships and the Promotion of Collective Entrepreneurship , 2005 .

[56]  Leonor Vasconcelos Ferreira Social Protection and Chronic Poverty: Portugal and the Southern European Welfare Regime , 2005 .

[57]  James C. Bezdek,et al.  A geometric approach to cluster validity for normal mixtures , 1997, Soft Comput..

[58]  P. Deb,et al.  Demand for Medical Care by the Elderly: A Finite Mixture Approach , 1997 .

[59]  Pedro Cosme Costa Vieira,et al.  Animals domestication and agriculture as outcomes of collusion , 2005 .

[60]  Aurora Amlia Castro Teixeira,et al.  How has the Portuguese innovation capability evolved? Estimating a time series of the stock of tec , 2007 .

[61]  Pedro Cosme Costa Vieira,et al.  The importance in the papers' impact of the number of pages and of co-authors - an empirical estimation with data from top ranking economic journals , 2005 .

[62]  Wayne S. DeSarbo,et al.  A latent class probit model for analyzing pick any/N data , 1991 .

[63]  Rui Henrique Alves Europe: Looking for a New Model , 2004 .

[64]  Ana Brochado,et al.  Democracy and Economic Development: a Fuzzy Classification Approach , 2005 .

[65]  K. Pearson Contributions to the Mathematical Theory of Evolution , 1894 .

[66]  João Correia-da-Silva,et al.  Contracts for uncertain delivery , 2005 .

[67]  Geoffrey J. McLachlan,et al.  Mixture models : inference and applications to clustering , 1989 .

[68]  M E Mattson Section 1: Conceptual and Methodological Foundations of COMBINE , 2005 .

[69]  Ana Paula Delgado,et al.  The evolution of city size distribution in Portugal: 1864-2001 , 2004 .

[70]  Aurora A.C. Teixeira,et al.  Universities as sources of knowledge for innovation.The case of Technology Intensive Firms in Portugal , 2005 .

[71]  Padhraic Smyth,et al.  Model selection for probabilistic clustering using cross-validated likelihood , 2000, Stat. Comput..

[72]  H. Bozdogan On the information-based measure of covariance complexity and its application to the evaluation of multivariate linear models , 1990 .

[73]  Rick L. Andrews,et al.  A Comparison of Segment Retention Criteria for Finite Mixture Logit Models , 2003 .

[74]  T. Hayton The Advanced Theory of Statistics, Vol. 3 , 1968 .

[75]  A. Teixeira,et al.  Economics of the firm and economic growth: a hybrid theoretical framework of analysis , 2005 .

[76]  M. P. Windham,et al.  Information-Based Validity Functionals for Mixture Analysis , 1994 .

[77]  Aurora A.C. Teixeira,et al.  Integrated graphical framework accounting for the nature and the speed of the learning process: an application to MNEs strategies of internationalisation of production and R&D investment , 2005 .

[78]  Paulo Guimaraes,et al.  Measuring the Localization of Economic Activity: A Random Utility Approach , 2004 .

[79]  Pedro Cosme Costa Vieira,et al.  Multi Product Market Equilibrium with Sequential Search , 2005 .

[80]  J. Wolfe A Monte Carlo Study of the Sampling Distribution of the Likelihood Ratio for Mixtures of Multinormal Distributions , 1971 .

[81]  N. Sedransk,et al.  Mixtures of Distributions: A Topological Approach , 1988 .

[82]  D. Titterington Some recent research in the analysis of mixture distributions , 1990 .

[83]  James C. Bezdek,et al.  Pattern Recognition with Fuzzy Objective Function Algorithms , 1981, Advanced Applications in Pattern Recognition.

[84]  H. Akaike A new look at the statistical model identification , 1974 .

[85]  Carlos F. Alves,et al.  Institutional Investor Activism: Does the Portfolio Management Skill Matter? , 2005 .

[86]  W. DeSarbo,et al.  A mixture likelihood approach for generalized linear models , 1995 .

[87]  P. Sen,et al.  On the asymptotic performance of the log likelihood ratio statistic for the mixture model and related results , 1984 .

[88]  Argentino Pessoa,et al.  Foreign direct investment and total factor productivity in OECD countries: evidence from aggregate data , 2005 .

[89]  D. Rubin,et al.  Estimation and Hypothesis Testing in Finite Mixture Models , 1985 .

[90]  Lúcia Paiva Martins de Sousa,et al.  Um Ranking das Revistas Científicas Especializadas em Economia Regional e Urbana , 2005, RPER.

[91]  J. Bezdek Numerical taxonomy with fuzzy sets , 1974 .

[92]  Adele Cutler,et al.  Information Ratios for Validating Mixture Analysis , 1992 .

[93]  Aurora A.C. Teixeira,et al.  Crime without punishment: An update review of the determinants of cheating among university students , 2005 .

[94]  A. Tavares,et al.  Human Capital Intensity in Technology-Based Firms Located in Portugal: Do Foreign Multinationals Make a Difference? , 2005 .

[95]  A. Wald Tests of statistical hypotheses concerning several parameters when the number of observations is large , 1943 .

[96]  Oscar Afonso,et al.  Price-Channel Effects of North-South Trade on the Direction of Technological Knowledge and Wage Inequality , 2005 .

[97]  J. Wolfe PATTERN CLUSTERING BY MULTIVARIATE MIXTURE ANALYSIS. , 1970, Multivariate behavioral research.

[98]  B. G. Quinn,et al.  The determination of the order of an autoregression , 1979 .

[99]  R. Hathaway Another interpretation of the EM algorithm for mixture distributions , 1986 .