Creation of Synthetic Microdata for Data Envelopment Analysis Using Nondominated Sorting

The Conservation Effects Assessment Program (CEAP) of the Agricultural Research Service of the United States Department of Agriculture has the calculation of trade-offs among farm profitability, environmental quality and program efficiency as one of its major objectives. This analysis requires production data at the level of individual farms for economic modeling. The best source of farm level production data is the Census of Agriculture, but the records for individual farms are confidential and cannot be directly used for the CEAP analysis. This study presents a method of synthetic data creation using constrained draws from a Bayesian network. The environmental studies that will be implemented with the synthetic data require that the sum of each variable in the synthetic data closely matches the observed sums within a watershed. The constraints are applied in a novel application of a nondominated sorted genetic algorithm (NSGA-II), and the synthetic data that is created is shown to protect the confidentiality of the original Census of Agriculture records. To assess the use of the synthetic data for economic analysis, we show that the best practice production frontier calculated with data envelopment analysis using synthetic microdata is statistically the same as the best practice production frontier calculated from original data. These results support the use of synthetic data sets created with method in the multiple objective economic analysis of CEAP.