Fast, Optimal, and Targeted Predictions Using Parameterized Decision Analysis

Prediction is critical for decision-making under uncertainty and lends validity to statistical inference. With targeted prediction, the goal is to optimize predictions for specific decision tasks of interest, which we represent via functionals. Although classical decision analysis extracts predictions from a Bayesian model, these predictions are often difficult to interpret and slow to compute. Instead, we design a class of parametrized actions for Bayesian decision analysis that produce optimal, scalable, and simple targeted predictions. For a wide variety of action parametrizations and loss functions--including linear actions with sparsity constraints for targeted variable selection--we derive a convenient representation of the optimal targeted prediction that yields efficient and interpretable solutions. Customized out-of-sample predictive metrics are developed to evaluate and compare among targeted predictors. Through careful use of the posterior predictive distribution, we introduce a procedure that identifies a set of near-optimal, or acceptable targeted predictors, which provide unique insights into the features and level of complexity needed for accurate targeted prediction. Simulations demonstrate excellent prediction, estimation, and variable selection capabilities. Targeted predictions are constructed for physical activity data from the National Health and Nutrition Examination Survey (NHANES) to better predict and understand the characteristics of intraday physical activity.

[1]  Mike West,et al.  VARIABLE PRIORITIZATION IN NONLINEAR BLACK BOX METHODS: A GENETIC ASSOCIATION CASE STUDY1. , 2018, The annals of applied statistics.

[2]  J. F. Bjørnstad Predictive Likelihood: A Review , 1990 .

[3]  Trevor Hastie,et al.  Regularization Paths for Generalized Linear Models via Coordinate Descent. , 2010, Journal of statistical software.

[4]  C. Carvalho,et al.  Decoupling Shrinkage and Selection in Bayesian Linear Models: A Posterior Summary Perspective , 2014, 1408.0464.

[5]  Daniel R. Kowal,et al.  Bayesian Function-on-Scalars Regression for High-Dimensional Data , 2018, Journal of Computational and Graphical Statistics.

[6]  Stephen G. Walker,et al.  Statistical Decision Problems and Bayesian Nonparametric Methods , 2005 .

[7]  Daniel R. Kowal,et al.  Simultaneous transformation and rounding (STAR) models for integer-valued data , 2019, Electronic Journal of Statistics.

[8]  Robert Tibshirani,et al.  The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd Edition , 2001, Springer Series in Statistics.

[9]  David Puelz,et al.  Variable Selection in Seemingly Unrelated Regressions with Random Predictors , 2016, 1605.08963.

[10]  Leo Breiman,et al.  Statistical Modeling: The Two Cultures (with comments and a rejoinder by the author) , 2001 .

[11]  Aki Vehtari,et al.  A survey of Bayesian predictive methods for model assessment, selection and comparison , 2012 .

[12]  A. Gelman,et al.  Pareto Smoothed Importance Sampling , 2015, 1507.02646.

[13]  Jacek Urbanek,et al.  The predictive performance of objective measures of physical activity derived from accelerometry data for 5-year all-cause mortality in older adults: NHANES 2003-2006. , 2019, The journals of gerontology. Series A, Biological sciences and medical sciences.

[14]  Xiao-Li Meng,et al.  POSTERIOR PREDICTIVE ASSESSMENT OF MODEL FITNESS VIA REALIZED DISCREPANCIES , 1996 .

[15]  Aki Vehtari,et al.  Projective Inference in High-dimensional Problems: Prediction and Feature Selection , 2018, Electronic Journal of Statistics.

[16]  Michael F. Leitzmann,et al.  Associations of Objectively Assessed Physical Activity and Sedentary Time with All-Cause Mortality in US Adults: The NHANES Study , 2015, PloS one.

[17]  Vadim Zipunnikov,et al.  Organizing and Analyzing the Activity Data in NHANES , 2019, Statistics in Biosciences.

[18]  Cynthia Rudin,et al.  A study in Rashomon curves and volumes: A new perspective on generalization and model simplicity in machine learning , 2019, ArXiv.

[19]  J. N. K. Rao,et al.  Impact of Frequentist and Bayesian Methods on Survey Sampling Practice: A Selective Appraisal , 2011, 1108.2356.

[20]  David B. Dunson,et al.  Constrained Bayesian Inference through Posterior Projections , 2018 .

[21]  E. Ionides Truncated Importance Sampling , 2008 .

[22]  D. Lindley The Choice of Variables in Multiple Regression , 1968 .

[23]  David J. Nott,et al.  The predictive Lasso , 2010, Stat. Comput..

[24]  P. Richard Hahn,et al.  Post-Processing Posteriors Over Precision Matrices to Produce Sparse Graph Estimates , 2019 .

[25]  Cynthia Rudin,et al.  Machine learning with operational costs , 2011, J. Mach. Learn. Res..

[26]  Jing Lei,et al.  Cross-Validation With Confidence , 2017, Journal of the American Statistical Association.

[27]  H. Zou The Adaptive Lasso and Its Oracle Properties , 2006 .

[28]  C. Robert,et al.  Model choice in generalised linear models: A Bayesian approach via Kullback-Leibler projections , 1998 .

[29]  D. Berrigan,et al.  Association between Objectively Measured Physical Activity and Mortality in NHANES. , 2016, Medicine & Science in Sports & Exercise.

[30]  David J. Nott,et al.  Computational Statistics and Data Analysis Bayesian Projection Approaches to Variable Selection in Generalized Linear Models , 2022 .

[31]  S. MacEachern Decision Theoretic Aspects of Dependent Nonparametric Processes , 2000 .

[32]  Hong Chang,et al.  Model Determination Using Predictive Distributions with Implementation via Sampling-Based Methods , 1992 .