Positing, fitting, and selecting regression models for pooled biomarker data

Pooling biospecimens prior to performing lab assays can help reduce lab costs, preserve specimens, and reduce information loss when subject to a limit of detection. Because many biomarkers measured in epidemiological studies are positive and right-skewed, proper analysis of pooled specimens requires special methods. In this paper, we develop and compare parametric regression models for skewed outcome data subject to pooling, including a novel parameterization of the gamma distribution that takes full advantage of the gamma summation property. We also develop a Monte Carlo approximation of Akaike's Information Criterion applied to pooled data in order to guide model selection. Simulation studies and analysis of motivating data from the Collaborative Perinatal Project suggest that using Akaike's Information Criterion to select the best parametric model can help ensure valid inference and promote estimate precision.

[1]  T. Louis Finding the Observed Information Matrix When Using the EM Algorithm , 1982 .

[2]  Enrique F Schisterman,et al.  Hybrid pooled–unpooled design for cost‐efficient measurement of biomarkers , 2010, Statistics in medicine.

[3]  Aijun Ye,et al.  Assessment of skewed exposure in case‐control studies with pooling , 2012, Statistics in medicine.

[4]  P. Hall,et al.  Nonparametric regression with homogeneous group testing data , 2012, 1205.6102.

[5]  J. Ibrahim,et al.  Model Selection Criteria for Missing-Data Problems Using the EM Algorithm , 2008, Journal of the American Statistical Association.

[6]  David R. Anderson,et al.  Model selection and multimodel inference : a practical information-theoretic approach , 2003 .

[7]  S Vansteelandt,et al.  Regression Models for Disease Prevalence with Diagnostic Tests on Pools of Serum Samples , 2000, Biometrics.

[8]  Paul S Albert,et al.  Latent class models for joint analysis of disease prevalence and high-dimensional semicontinuous biomarker data. , 2012, Biostatistics.

[9]  R. W. Wedderburn Quasi-likelihood functions, generalized linear models, and the Gauss-Newton method , 1974 .

[10]  Albert Vexler,et al.  A combined efficient design for biomarker data subject to a limit of detection due to measuring instrument sensitivity , 2011, 1202.6524.

[11]  P. Sly,et al.  Pooled biological specimens for human biomonitoring of environmental chemicals: Opportunities and limitations , 2014, Journal of Exposure Science and Environmental Epidemiology.

[12]  Paul Dupuis,et al.  Importance sampling for sums of random variables with regularly varying tails , 2007, TOMC.

[13]  Albert Vexler,et al.  To pool or not to pool, from whether to when: applications of pooling to biospecimens subject to a limit of detection. , 2008, Paediatric and perinatal epidemiology.

[14]  Andradottir Sigrun,et al.  Perwez Shahabuddin,1962‐2005:職業的な評価 , 2007 .

[15]  Joshua M Tebbs,et al.  Regression analysis for multiple‐disease group testing data , 2013, Statistics in medicine.

[16]  C R Weinberg,et al.  Using Pooled Exposure Assessment to Improve Efficiency in Case‐Control Studies , 1999, Biometrics.

[17]  Paul S Albert,et al.  Pooling Designs for Outcomes under a Gaussian Random Effects Model , 2012, Biometrics.

[18]  B. Whitcomb,et al.  Circulating levels of cytokines during pregnancy: thrombopoietin is elevated in miscarriage. , 2008, Fertility and sterility.

[19]  Chang-Xing Ma,et al.  Cost-efficient designs based on linearly associated biomarkers , 2011 .

[20]  B. Velkeniers,et al.  The role of thyroid autoimmunity in fertility and pregnancy , 2008, Nature Clinical Practice Endocrinology &Metabolism.

[21]  J. Aronson Biomarkers and surrogate endpoints. , 2005, British journal of clinical pharmacology.

[22]  S. Caudill,et al.  Important issues related to using pooled samples for environmental chemical biomonitoring , 2011, Statistics in medicine.

[23]  Joshua M Tebbs,et al.  Two‐Stage Hierarchical Group Testing for Multiple Infections with Application to the Infertility Prevention Project , 2013, Biometrics.

[24]  E. J. Dick,et al.  Beyond ‘lognormal versus gamma’: discrimination among error distributions for generalized linear models , 2004 .

[25]  S. Caudill,et al.  Characterizing populations of individuals using pooled samples , 2010, Journal of Exposure Science and Environmental Epidemiology.

[26]  Enrique F Schisterman,et al.  Roc Curve Analysis for Biomarkers Based on Pooled Assessments , 2022 .

[27]  Kenneth Lange,et al.  Numerical analysis for statisticians , 1999 .

[28]  D. Harlan,et al.  Cytokine polymorphic analyses indicate ethnic differences in the allelic distribution of interleukin-2 and interleukin-6. , 2001, Transplantation.

[29]  R. Dorfman The Detection of Defective Members of Large Populations , 1943 .

[30]  Robert H Lyles,et al.  A highly efficient design strategy for regression with outcome pooling , 2014, Statistics in medicine.

[31]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[32]  Enrique F Schisterman,et al.  Pooling biospecimens and limits of detection: effects on ROC curve analysis. , 2006, Biostatistics.

[33]  Robert H Lyles,et al.  Regression for skewed biomarker outcomes subject to pooling , 2014, Biometrics.

[34]  A. Moscicki,et al.  Determination of Cytokine Protein Levels in Cervical Mucus Samples from Young Women by a Multiplex Immunoassay Method and Assessment of Correlates , 2007, Clinical and Vaccine Immunology.

[35]  H. Akaike A new look at the statistical model identification , 1974 .

[36]  David Firth,et al.  Multiplicative Errors: Log‐Normal or Gamma? , 1988 .