Exploratory quantile regression with many covariates: an application to adverse birth outcomes.

Covariates may affect continuous responses differently at various points of the response distribution. For example, some exposure might have minimal impact on conditional means, whereas it might lower conditional 10th percentiles sharply. Such differential effects can be important to detect. In studies of the determinants of birth weight, for instance, it is critical to identify exposures like the one above, since low birth weight is a risk factor for later health problems. Effects of covariates on the tails of distributions can be obscured by models (such as linear regression) that estimate conditional means; however, effects on tails can be detected by quantile regression. We present 2 approaches for exploring high-dimensional predictor spaces to identify important predictors for quantile regression. These are based on the lasso and elastic net penalties. We apply the approaches to a prospective cohort study of adverse birth outcomes that includes a wide array of demographic, medical, psychosocial, and environmental variables. Although tobacco exposure is known to be associated with lower birth weights, the analysis suggests an interesting interaction effect not previously reported: tobacco exposure depresses the 20th and 30th percentiles of birth weight more strongly when mothers have high levels of lead in their blood compared with those who have low blood lead levels.

[1]  Jerome P. Reiter,et al.  Maternal Prenatal Pregnancy-Related Anxiety and Spontaneous Preterm Birth in Baltimore, Maryland , 2007, Psychosomatic medicine.

[2]  E. Guallar,et al.  Lead Exposure and Cardiovascular Disease—A Systematic Review , 2006, Environmental health perspectives.

[3]  S. Swan,et al.  Prenatal active or passive tobacco smoke exposure and the risk of preterm delivery or low birth weight. , 2000, Epidemiology.

[4]  R. Hornung,et al.  Association of Tobacco and Lead Exposures With Attention-Deficit/Hyperactivity Disorder , 2009, Pediatrics.

[5]  M. Krohn,et al.  Urinary cotinine concentration confirms the reduced risk of preeclampsia with tobacco exposure. , 1999, American journal of obstetrics and gynecology.

[6]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[7]  Howard Hu,et al.  Decrease in birth weight in relation to maternal bone-lead burden. , 1997, Pediatrics.

[8]  R. Koenker,et al.  Robust Tests for Heteroscedasticity Based on Regression Quantiles , 1982 .

[9]  R. Koenker,et al.  Regression Quantiles , 2007 .

[10]  Yufeng Liu,et al.  VARIABLE SELECTION IN QUANTILE REGRESSION , 2009 .

[11]  Sharon E. Edwards,et al.  Disparities in Maternal Hypertension and Pregnancy Outcomes: Evidence from North Carolina, 1994–2003 , 2010, Public health reports.

[12]  L. Hunt,et al.  Psychosocial stress in pregnancy and its relation to low birth weight. , 1984, British medical journal.

[13]  Shizhong Xu,et al.  An expectation–maximization algorithm for the Lasso estimation of quantitative trait locus effects , 2010, Heredity.

[14]  Jerome P. Reiter,et al.  Multiple imputation for missing data via sequential regression trees. , 2010, American journal of epidemiology.

[15]  G. Schwarz Estimating the Dimension of a Model , 1978 .

[16]  Ellen Silbergeld,et al.  Blood lead levels and mortality. , 2002, Archives of internal medicine.

[17]  M. Miranda,et al.  Environmental contributors to the achievement gap. , 2009, Neurotoxicology.

[18]  B. Caffo,et al.  Cigarette smoking and nocturnal sleep architecture. , 2006, American journal of epidemiology.

[19]  Sharon E. Edwards,et al.  Environmental contributions to disparities in pregnancy outcomes. , 2009, Epidemiologic reviews.

[20]  R. Koenker Quantile regression for longitudinal data , 2004 .

[21]  Sharon E. Edwards,et al.  Maternal age, birth order, and race: differential effects on birthweight , 2010, Journal of Epidemiology & Community Health.

[22]  H. Akaike,et al.  Information Theory and an Extension of the Maximum Likelihood Principle , 1973 .

[23]  P. Whelton,et al.  Trends in blood pressure among children and adolescents. , 2004, JAMA.

[24]  A. E. Hoerl,et al.  Ridge regression: biased estimation for nonorthogonal problems , 2000 .

[25]  Ruth Ann Marrie,et al.  Quantile regression and restricted cubic splines are useful for exploring relationships between continuous variables. , 2009, Journal of clinical epidemiology.

[26]  Grace Wahba,et al.  Detecting disease-causing genes by LASSO-Patternsearch algorithm , 2007, BMC proceedings.

[27]  Ruibin Xi,et al.  Bayesian regularized quantile regression , 2010 .

[28]  B. Turlach Discussion of "Least Angle Regression" by Efron, Hastie, Johnstone and Tibshirani , 2004 .

[29]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[30]  David R. Anderson,et al.  Multimodel Inference , 2004 .

[31]  Torsten Hothorn,et al.  Identifying Risk Factors for Severe Childhood Malnutrition by Boosting Additive Quantile Regression , 2011 .

[32]  Jacqueline P. Leighton,et al.  Corsini Encyclopedia of Psychology , 2010 .

[33]  Jerome P. Reiter,et al.  Depressive symptoms and indicators of maternal health status during pregnancy. , 2007, Journal of women's health.

[34]  David R. Anderson,et al.  Understanding AIC and BIC in Model Selection , 2004 .

[35]  N. Paneth,et al.  A quantitative review of mortality and developmental disability in extremely premature newborns. , 1998, Archives of pediatrics & adolescent medicine.

[36]  Leo Breiman,et al.  Classification and Regression Trees , 1984 .

[37]  Haibo Zhou,et al.  Smoking and Pregnancy Outcome among African-American and White Women in Central North Carolina , 2001, Epidemiology.

[38]  Kamhon Kan,et al.  Obesity and risk knowledge. , 2004, Journal of health economics.

[39]  P. Auld,et al.  Social competence and behavior problems in premature children at school age. , 1990, Pediatrics.

[40]  Arthur E. Hoerl,et al.  Ridge Regression: Biased Estimation for Nonorthogonal Problems , 2000, Technometrics.

[41]  Robert Tibshirani,et al.  The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd Edition , 2001, Springer Series in Statistics.

[42]  E. K. Adams,et al.  Effects of smoking during pregnancy. Five meta-analyses. , 1999, American journal of preventive medicine.

[43]  P. Magnus,et al.  The association of parity and birth weight: testing the sensitization hypothesis. , 1985, Early human development.

[44]  M C McCormick,et al.  Very low birth weight children: behavior problems and school difficulty in a national sample. , 1990, The Journal of pediatrics.

[45]  Ji Zhu,et al.  L1-Norm Quantile Regression , 2008 .

[46]  P. Zhao Boosted Lasso , 2004 .

[47]  S. Cnattingius,et al.  The paradoxical effect of smoking in preeclamptic pregnancies: smoking reduces the incidence but increases the rates of perinatal mortality, abruptio placentae, and intrauterine growth restriction. , 1997, American journal of obstetrics and gynecology.

[48]  Wei-Yin Loh,et al.  Classification and regression trees , 2011, WIREs Data Mining Knowl. Discov..

[49]  R. King,et al.  The contribution of low birth weight to severe vision loss in a geographically defined population , 1998, The British journal of ophthalmology.

[50]  H. Zou,et al.  Regularization and variable selection via the elastic net , 2005 .

[51]  Assuring Healthy Outcomes,et al.  Preterm Birth : Causes , Consequences , and Prevention , 2005 .

[52]  A. Afifi,et al.  Comparison of Stopping Rules in Forward “Stepwise” Regression , 1977 .

[53]  D. Schunk,et al.  Self-Efficacy and Academic Motivation , 1991 .