Joint determination of optimal stratification and sample allocation using genetic algorithm

This paper offers a solution to the problem of finding the optimal stratification of the available population frame, so as to ensure the minimization of the cost of the sample required to satisfy precision constraints on a set of different target estimates. The solution is searched by exploring the universe of all possible stratifications obtainable by cross-classifying the categorical auxiliary variables available in the frame (continuous auxiliary variables can be transformed into categorical ones by means of suitable methods). Therefore, the followed approach is multivariate with respect to both target and auxiliary variables. The proposed algorithm is based on a non deterministic evolutionary approach, making use of the genetic algorithm paradigm. The key feature of the algorithm is in considering each possible stratification as an individual subject to evolution, whose fitness is given by the cost of the associated sample required to satisfy a set of precision constraints, the cost being calculated by applying the Bethel algorithm for multivariate allocation. This optimal stratification algorithm, implemented in an R package (SamplingStrata), has been so far applied to a number of current surveys in the Italian National Institute of Statistics: the obtained results always show significant improvements in the efficiency of the samples obtained, with respect to previously adopted stratifications.

[1]  Lothar M. Schmitt,et al.  Theory of Genetic Algorithms II: models for genetic operators over the string-tensor representation of populations and convergence to global optima for arbitrary fitness function under scaling , 2004, Theor. Comput. Sci..

[2]  Lynne Stokes,et al.  Using spreadsheet solvers in sample design , 2004, Comput. Stat. Data Anal..

[3]  A. Winsor Sampling techniques. , 2000, Nursing times.

[4]  J. A. Díaz-García,et al.  Multi-objective optimisation for optimum allocation in multivariate stratified sampling , 2008 .

[5]  Timur Keskintürk,et al.  A genetic algorithm approach to determine stratum boundaries and sample sizes of each stratum in stratified sampling , 2007, Comput. Stat. Data Anal..

[6]  Patricia Gunning,et al.  Stratification of skewed populations , 2006 .

[7]  Lothar M. Schmitt,et al.  Theory of genetic algorithms , 2001, Theor. Comput. Sci..

[8]  M. G. Khan,et al.  Determining the optimum strata boundary points using dynamic programming , 2008 .

[9]  R. Singh,et al.  Approximately Optimum Stratification on the Auxiliary Variable , 1971 .

[10]  Roberto Benedetti,et al.  A Tree-Based Approach to Forming Strata in Multipurpose Business Surveys , 2005 .

[11]  Louis-Paul Rivest,et al.  The construction of stratified designs in R with the package stratification , 2010 .

[12]  Michael D. Vose,et al.  The simple genetic algorithm - foundations and theory , 1999, Complex adaptive systems.

[13]  Robin K. S. Hankin,et al.  Set partitions in R , 2007 .

[14]  M. Kozak On Sample Allocation in Multivariate Surveys , 2006 .

[15]  Leslie Kish,et al.  Optima and Proxima in Linear Sample Designs , 1976 .

[16]  J. L. Hodges,et al.  Minimum Variance Stratification , 1959 .

[17]  Paul Erdös,et al.  On additive partitions of integers , 1978, Discret. Math..

[18]  Jane M. Horgan,et al.  A New Algorithm for the Construction of Stratum Boundaries in Skewed Populations , 2005 .

[19]  Louis-Paul Rivest,et al.  A Generalization of the Lavallée and Hidiroglou Algorithm for Stratification in Business Surveys , 2002 .

[20]  M. Kozak,et al.  Stratified two-stage sampling in domains: Sample allocation between domains, strata, and sampling stages , 2008 .

[21]  M. Hidiroglou The Construction of a Self-Representing Stratum of Large Units in Survey Design , 1986 .

[22]  J. A. Hartigan,et al.  A k-means clustering algorithm , 1979 .

[23]  Charles D. Day A Multi-Objective Evolutionary Algorithm for Multivariate Optimal Allocation , 2010 .

[24]  Louis-Paul Rivest,et al.  A General Algorithm for Univariate Stratification , 2009 .

[25]  R. R. Hocking,et al.  Optimal Sample Allocation to Strata Using Convex Programming , 1970 .

[26]  T. Maiti,et al.  An optimal multivariate stratified sampling design using auxiliary information: an integer solution using goal programming approach , 2010 .