Simulation of massive public health data by power polynomials

Situations in which multiple outcomes and predictors of different distributional types are collected are becoming increasingly common in public health practice, and joint modeling of mixed types has been gaining popularity in recent years. Evaluation of various statistical techniques that have been developed for mixed data in simulated environments necessarily requires joint generation of multiple variables. Most massive public health data sets include different types of variables. For instance, in clustered or longitudinal designs, often multiple variables are measured or observed for each individual or at each occasion. This work is motivated by a need to jointly generate binary and possibly non-normal continuous variables. We illustrate the use of power polynomials to simulate multivariate mixed data on the basis of a real adolescent smoking study. We believe that our proposed technique for simulating such intensive data has the potential to be a handy methodological addition to public health researchers' toolkit.

[1]  N. L. Johnson,et al.  Systems of frequency curves generated by methods of translation. , 1949, Biometrika.

[2]  Donald Hedeker,et al.  A Practical Way for Computing Approximate Lower and Upper Correlation Bounds , 2011 .

[3]  R. S. Parrish,et al.  Generating random deviates from multivariate Pearson distributions , 1990 .

[4]  Hakan Demirtas,et al.  Simultaneous Generation of Binary and Normal Data with Specified Marginal and Association Structures , 2012, Journal of biopharmaceutical statistics.

[5]  Hakan Demirtas Practical Advice on How to Impute Continuous Data When the Ultimate Interest Centers on Dichotomized Outcomes Through Pre-Specified Thresholds , 2007, Commun. Stat. Simul. Comput..

[6]  Todd C. Headrick,et al.  The power method transformation: its probability density function, distribution function, and its further use for fitting data , 2007 .

[7]  Donald Hedeker,et al.  Multiple Imputation Under Power Polynomials , 2008, Commun. Stat. Simul. Comput..

[8]  Allen I. Fleishman A method for simulating non-normal distributions , 1978 .

[9]  Kristopher J Preacher,et al.  On the practice of dichotomization of quantitative variables. , 2002, Psychological methods.

[10]  Pandu R. Tadikamalla,et al.  On simulating non-normal distributions , 1980 .

[11]  Hakan Demirtas,et al.  Simulation driven inferences for multiply imputed longitudinal datasets * , 2004 .

[12]  Donald Hedeker,et al.  Gaussianization‐based quasi‐imputation and expansion strategies for incomplete correlated binary responses , 2007, Statistics in medicine.

[13]  C. D. Vale,et al.  Simulating multivariate nonnormal distributions , 1983 .

[14]  Joseph L Schafer,et al.  On the performance of random‐coefficient pattern‐mixture models for non‐ignorable drop‐out , 2003, Statistics in medicine.

[15]  Todd C. Headrick,et al.  Weighted Simplex Procedures for Determining Boundary Points and Constants for the Univariate and Multivariate Power Methods , 2000 .

[16]  H. Demirtas A method for multivariate ordinal data generation given marginal distributions and correlations , 2006 .

[17]  D. Hedeker,et al.  An Application of a Mixed‐Effects Location Scale Model for Analysis of Ecological Momentary Assessment (EMA) Data , 2008, Biometrics.

[18]  N. Bolger,et al.  Diary methods: capturing life as it is lived. , 2003, Annual review of psychology.

[19]  Hakan Demirtas,et al.  Plausibility of multivariate normality assumption when multiply imputing non-Gaussian continuous outcomes: a simulation assessment , 2008 .

[20]  Bahjat F. Qaqish,et al.  A family of multivariate binary distributions for simulating correlated binary variables with specified marginal means and correlations , 2003 .

[21]  Donald Hedeker,et al.  An imputation strategy for incomplete longitudinal ordinal data , 2008, Statistics in medicine.

[22]  M. Piedmonte,et al.  A Method for Generating High-Dimensional Multivariate Binary Variates , 1991 .

[23]  Donald Hedeker,et al.  On the performance of bias-reduction techniques for variance estimation in approximate Bayesian bootstrap imputation , 2007, Comput. Stat. Data Anal..

[24]  Todd C. Headrick Fast fifth-order polynomial transforms for generating univariate and multivariate nonnormal distributions , 2002 .

[25]  I. W. Burr Cumulative Frequency Functions , 1942 .

[26]  Todd C. Headrick Statistical Simulation: Power Method Polynomials and Other Transformations , 2009 .

[27]  N. Higham Computing the nearest correlation matrix—a problem from finance , 2002 .

[28]  Stuart Jay Deutsch,et al.  A Versatile Four Parameter Family of Probability Distributions Suitable for Simulation , 1977 .

[29]  Hakan Demirtas,et al.  Multiple imputation under Bayesianly smoothed pattern‐mixture models for non‐ignorable drop‐out , 2005, Statistics in medicine.