A Tree-Based Approach to Forming Strata in Multipurpose Business Surveys

The design of a stratified sample from a finite population deals with two main issues: the definition of a rule to partition the population, and the allocation of sampling units in the selected strata. This article examines a tree-based strategy which plans to solve jointly these issues when the survey is multipurpose and multivariate information, quantitative or qualitative, is available. Strata are formed through a scissorial algorithm that selects finer and finer partitions by minimizing, at each step, the sample allocation required to achieve the precision levels set for each surveyed variable. In this way, large numbers of constraints can be satisfied without drastically increasing the sample size, and also without discarding variables selected for stratification or diminishing the number of their class intervals. Furthermore, the algorithm tends to not define empty or almost empty strata, so avoiding the need for ex post strata aggregations. The procedure was applied to redesign the Italian Farm Structure Survey. The results indicate that the gain in efficiency held using our strategy is nontrivial. For a given sample size, this procedure achieves the required precision by exploiting a number of strata which is usually a very small fraction of the number of strata available when combining all possible classes from any of the covariates.