Mixture structure analysis using the Akaike Information Criterion and the bootstrap

Given i.i.d. observations x1,x2,x3,...,xn drawn from a mixture of normal terms, one is often interested in determining the number of terms in the mixture and their defining parameters. Although the problem of determining the number of terms is intractable under the most general assumptions, there is hope of elucidating the mixture structure given appropriate caveats on the underlying mixture. This paper examines a new approach to this problem based on the use of Akaike Information Criterion (AIC) based pruning of data driven mixture models which are obtained from resampled data sets. Results of the application of this procedure to artificially generated data sets and a real world data set are provided.

[1]  H. Bozdogan,et al.  Multi-sample cluster analysis using Akaike's Information Criterion , 1984 .

[2]  A. Cohen,et al.  Finite Mixture Distributions , 1982 .

[3]  G. W. Rogers,et al.  The application of fractal analysis to mammographic tissue classification. , 1994, Cancer letters.

[4]  G. McLachlan On Bootstrapping the Likelihood Ratio Test Statistic for the Number of Components in a Normal Mixture , 1987 .

[5]  Carey E. Priebe,et al.  An initial assessment of discriminant surface complexity for power law features , 1992, Simul..

[6]  E. Wegman,et al.  A Visualization Technique for Studying the Iterative Estimation of Mixture Densities , 1995 .

[7]  David W. Scott,et al.  Multivariate Density Estimation: Theory, Practice, and Visualization , 1992, Wiley Series in Probability and Statistics.

[8]  Herbert A. Sturges,et al.  The Choice of a Class Interval , 1926 .

[9]  D. Binder Bayesian cluster analysis , 1978 .

[10]  H. Akaike A new look at the statistical model identification , 1974 .

[11]  Edward J. Wegman,et al.  Maximum Likelihood Estimation of a Unimodal Density Function , 1970 .

[12]  David W. Scott,et al.  Frequency Polygons: Theory and Application , 1985 .

[13]  J. L. Solka Matching model information content to data information , 1995 .

[14]  André Hardy,et al.  An examination of procedures for determining the number of clusters in a data set , 1994 .

[15]  R. Jaszczak,et al.  Parameter estimation of finite mixtures using the EM algorithm and information criteria with application to medical image processing , 1992 .

[16]  C. Chan,et al.  Maximum likelihood density estimation by means of a PDP network , 1990, IEEE TENCON'90: 1990 IEEE Region 10 Conference on Computer and Communication Systems. Conference Proceedings.

[17]  A. F. Smith,et al.  Statistical analysis of finite mixture distributions , 1986 .

[18]  Robert Tibshirani,et al.  An Introduction to the Bootstrap , 1994 .

[19]  M. Wand,et al.  EXACT MEAN INTEGRATED SQUARED ERROR , 1992 .

[20]  C. D. Kemp,et al.  Density Estimation for Statistics and Data Analysis , 1987 .

[21]  Carey E. Priebe,et al.  Discriminant analysis in aerial images using fractal-based features , 1993, Defense, Security, and Sensing.

[22]  Carey E. Priebe,et al.  Probabilistic approach to fractal-based texture discrimination , 1993, Defense, Security, and Sensing.

[23]  David J. Marchette,et al.  Adaptive mixture density estimation , 1993, Pattern Recognit..

[24]  C. S. Wallace,et al.  An Information Measure for Classification , 1968, Comput. J..

[25]  E. Parzen Nonparametric Statistical Data Modeling , 1979 .

[26]  D. Titterington Recursive Parameter Estimation Using Incomplete Data , 1984 .

[27]  D. W. Scott Averaged Shifted Histograms: Effective Nonparametric Density Estimators in Several Dimensions , 1985 .

[28]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[29]  Michael B. Merickel,et al.  Supervising ISODATA with an information theoretic stopping rule , 1990, Pattern Recognit..