A menu-driven software package of Bayesian nonparametric (and parametric) mixed models for regression analysis and density estimation

Most of applied statistics involves regression analysis of data. In practice, it is important to specify a regression model that has minimal assumptions which are not violated by data, to ensure that statistical inferences from the model are informative and not misleading. This paper presents a stand-alone and menu-driven software package, Bayesian Regression: Nonparametric and Parametric Models, constructed from MATLAB Compiler. Currently, this package gives the user a choice from 83 Bayesian models for data analysis. They include 47 Bayesian nonparametric (BNP) infinite-mixture regression models; 5 BNP infinite-mixture models for density estimation; and 31 normal random effects models (HLMs), including normal linear models. Each of the 78 regression models handles either a continuous, binary, or ordinal dependent variable, and can handle multi-level (grouped) data. All 83 Bayesian models can handle the analysis of weighted observations (e.g., for meta-analysis), and the analysis of left-censored, right-censored, and/or interval-censored data. Each BNP infinite-mixture model has a mixture distribution assigned one of various BNP prior distributions, including priors defined by either the Dirichlet process, Pitman-Yor process (including the normalized stable process), beta (two-parameter) process, normalized inverse-Gaussian process, geometric weights prior, dependent Dirichlet process, or the dependent infinite-probits prior. The software user can mouse-click to select a Bayesian model and perform data analysis via Markov chain Monte Carlo (MCMC) sampling. After the sampling completes, the software automatically opens text output that reports MCMC-based estimates of the model’s posterior distribution and model predictive fit to the data. Additional text and/or graphical output can be generated by mouse-clicking other menu options. This includes output of MCMC convergence analyses, and estimates of the model’s posterior predictive distribution, for selected functionals and values of covariates. The software is illustrated through the BNP regression analysis of real data.

[1]  L. Mark Berliner,et al.  Subsampling the Gibbs Sampler , 1994 .

[2]  A. Lijoi,et al.  Distributional results for means of normalized random measures with independent increments , 2003 .

[3]  Fernando A. Quintana,et al.  Nonparametric Bayesian data analysis , 2004 .

[4]  J. Lunceford,et al.  Stratification and weighting via the propensity score in estimation of causal treatment effects: a comparative study , 2004, Statistics in medicine.

[5]  Matteo Ruggiero,et al.  Are Gibbs-Type Priors the Most Natural Generalization of the Dirichlet Process? , 2015, IEEE transactions on pattern analysis and machine intelligence.

[6]  Ana Ivelisse Avilés,et al.  Linear Mixed Models for Longitudinal Data , 2001, Technometrics.

[7]  D. W. Scott,et al.  Multivariate Density Estimation, Theory, Practice and Visualization , 1992 .

[8]  C. Holmes,et al.  Bayesian auxiliary variable models for binary and multinomial regression , 2006 .

[9]  George Karabatsos,et al.  A Bayesian nonparametric meta‐analysis model , 2013, Research synthesis methods.

[10]  A. Gelfand,et al.  On nonparametric Bayesian inference for the distribution of a random sample , 1995 .

[11]  E. George,et al.  Journal of the American Statistical Association is currently published by American Statistical Association. , 2007 .

[12]  M. Escobar,et al.  Bayesian Density Estimation and Inference Using Mixtures , 1995 .

[13]  Andrew Thomas,et al.  The BUGS project: Evolution, critique and future directions , 2009, Statistics in medicine.

[14]  David S. Lee Randomized experiments from non-random selection in U.S. House elections , 2005 .

[15]  N. L. Johnson,et al.  Continuous Univariate Distributions. , 1995 .

[16]  J. Hahn,et al.  IDENTIFICATION AND ESTIMATION OF TREATMENT EFFECTS WITH A REGRESSION-DISCONTINUITY DESIGN , 2001 .

[17]  Riten Mitra,et al.  Nonparametric Bayesian inference in biostatistics , 2015 .

[18]  B. Silverman,et al.  Nonparametric Regression and Generalized Linear Models: A roughness penalty approach , 1993 .

[19]  P. McCullagh,et al.  Generalized Linear Models, 2nd Edn. , 1990 .

[20]  Purushottam W. Laud,et al.  Predictive Model Selection , 1995 .

[21]  Stephen G. Walker,et al.  A New Bayesian Nonparametric Mixture Model , 2010, Commun. Stat. Simul. Comput..

[22]  David S. Lee,et al.  Regression Discontinuity Designs in Economics , 2009 .

[23]  W. Gilks Markov Chain Monte Carlo , 2005 .

[24]  J. Schafer,et al.  Average causal effects from nonrandomized studies: a practical guide and simulated example. , 2008, Psychological methods.

[25]  D. Campbell,et al.  Regression-Discontinuity Analysis: An Alternative to the Ex-Post Facto Experiment , 1960 .

[26]  David B. Dunson,et al.  Bayesian Data Analysis , 2010 .

[27]  유정수,et al.  어닐링에 의한 Hierarchical Mixtures of Experts를 이용한 시계열 예측 , 1998 .

[28]  J. Sethuraman A CONSTRUCTIVE DEFINITION OF DIRICHLET PRIORS , 1991 .

[29]  Albert Y. Lo,et al.  On a Class of Bayesian Nonparametric Estimates: I. Density Estimates , 1984 .

[30]  S. MacEachern Decision Theoretic Aspects of Dependent Nonparametric Processes , 2000 .

[31]  A. Bowman,et al.  Applied smoothing techniques for data analysis : the kernel approach with S-plus illustrations , 1999 .

[32]  Thomas D. Cook,et al.  "Waiting for Life to Arrive": A history of the regression-discontinuity design in Psychology, Statistics and Economics , 2008 .

[33]  Lancelot F. James,et al.  Gibbs Sampling Methods for Stick-Breaking Priors , 2001 .

[34]  Djc MacKay,et al.  Slice sampling - Discussion , 2003 .

[35]  B. Hansen The prognostic analogue of the propensity score , 2008 .

[36]  Jonathan R. Stroud,et al.  Dynamic models for spatiotemporal data , 2001 .

[37]  T. Ferguson A Bayesian Analysis of Some Nonparametric Problems , 1973 .

[38]  Edgar Brunner,et al.  Nonparametric analysis of longitudinal data in factorial experiments , 2012 .

[39]  M. Pagano,et al.  Survival analysis. , 1996, Nutrition.

[40]  S. MacEachern,et al.  A nonparametric Bayesian model for inference in related longitudinal studies , 2005 .

[41]  S. Sharp,et al.  Explaining heterogeneity in meta-analysis: a comparison of methods. , 1999 .

[42]  Ramsés H. Mena,et al.  Hierarchical Mixture Modeling With Normalized Inverse-Gaussian Priors , 2005 .

[43]  Deborah Burr,et al.  bspmma: An R Package for Bayesian Semiparametric Models for Meta-Analysis , 2012 .

[44]  R. Gnanadesikan,et al.  Probability plotting methods for the analysis of data. , 1968, Biometrika.

[45]  I. G. Evans Bayesian Estimation of Parameters of a Multivariate Normal Distribution , 1965 .

[46]  J. H. Schuenemeyer,et al.  Generalized Linear Models (2nd ed.) , 1992 .

[47]  Stephen G. Walker,et al.  Bayesian nonparametric mixed random utility models , 2012, Comput. Stat. Data Anal..

[48]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[49]  D. Rubin,et al.  The central role of the propensity score in observational studies for causal effects , 1983 .

[50]  Rupert G. Miller,et al.  Survival Analysis , 2022, The SAGE Encyclopedia of Research Design.

[51]  Guilherme J. M. Rosa The Elements of Statistical Learning: Data Mining, Inference, and Prediction by HASTIE, T., TIBSHIRANI, R., and FRIEDMAN, J , 2010 .

[52]  L. Hedges,et al.  The Handbook of Research Synthesis and Meta-Analysis , 2009 .

[53]  Chih-Jen Lin,et al.  Probability Estimates for Multi-class Classification by Pairwise Coupling , 2003, J. Mach. Learn. Res..

[54]  Geoffrey J. McLachlan,et al.  Finite Mixture Models , 2019, Annual Review of Statistics and Its Application.

[55]  C. Chatfield Continuous Univariate Distributions, Vol. 1 , 1995 .

[56]  Stephen G. Walker,et al.  Slice sampling mixture models , 2011, Stat. Comput..

[57]  G. Casella,et al.  Statistical Inference , 2003, Encyclopedia of Social Network Analysis and Mining.

[58]  F. Y. Edgeworth,et al.  The theory of statistics , 1996 .

[59]  Dirk P. Kroese,et al.  Kernel density estimation via diffusion , 2010, 1011.2602.

[60]  Chuhsing Kate Hsiao,et al.  Modeling the Association Between Clusters of SNPs and Disease Responses , 2015 .

[61]  E. George,et al.  APPROACHES FOR BAYESIAN VARIABLE SELECTION , 1997 .

[62]  F. Quintana,et al.  Bayesian clustering and product partition models , 2003 .

[63]  D. Lindley,et al.  Bayes Estimates for the Linear Model , 1972 .

[64]  R. Wolpert Lévy Processes , 2000 .

[65]  Roderick J A Little,et al.  A Review of Hot Deck Imputation for Survey Non‐response , 2010, International statistical review = Revue internationale de statistique.

[66]  Carl E. Rasmussen,et al.  Infinite Mixtures of Gaussian Process Experts , 2001, NIPS.

[67]  G. Smith,et al.  Bias in meta-analysis detected by a simple, graphical test , 1997, BMJ.

[68]  P. McCullagh Partition models , 2015 .

[69]  J. Hartigan,et al.  A Bayesian Analysis for Change Point Problems , 1993 .

[70]  Alan Agresti,et al.  Categorical Data Analysis , 2003 .

[71]  Michael I. Jordan,et al.  Hierarchical Mixtures of Experts and the EM Algorithm , 1994, Neural Computation.

[72]  M. Schervish Theory of Statistics , 1995 .

[73]  James M. Flegal,et al.  Chapter 1 Implementing Markov chain Monte Carlo : Estimating with confidence , 2010 .

[74]  J. Pitman,et al.  The two-parameter Poisson-Dirichlet distribution derived from a stable subordinator , 1997 .

[75]  T. Hanson Inference for Mixtures of Finite Polya Tree Models , 2006 .

[76]  Michael,et al.  On a Class of Bayesian Nonparametric Estimates : I . Density Estimates , 2008 .

[77]  Adrian F. M. Smith,et al.  Bayesian Analysis of Constrained Parameter and Truncated Data Problems , 1991 .

[78]  Michael A. West,et al.  Time Series: Modeling, Computation, and Inference , 2010 .

[79]  J. Sethuraman,et al.  Convergence of Dirichlet Measures and the Interpretation of Their Parameter. , 1981 .

[80]  Christian P. Robert,et al.  Monte Carlo Statistical Methods , 2005, Springer Texts in Statistics.

[81]  Arthur E. Hoerl,et al.  Ridge Regression: Biased Estimation for Nonorthogonal Problems , 2000, Technometrics.

[82]  P. Deb Finite Mixture Models , 2008 .

[83]  Peter Müller,et al.  DPpackage: Bayesian Semi- and Nonparametric Modeling in R , 2011 .

[84]  D. Nychka Spatial‐Process Estimates as Smoothers , 2012 .

[85]  M. Plummer,et al.  CODA: convergence diagnosis and output analysis for MCMC , 2006 .

[86]  David W. Scott,et al.  Multivariate Density Estimation: Theory, Practice, and Visualization , 1992, Wiley Series in Probability and Statistics.

[87]  Stephen P. Brooks,et al.  Quantitative convergence assessment for Markov chain Monte Carlo via cusums , 1998, Stat. Comput..

[88]  Stephen G. Walker,et al.  A nonparametric dependent process for Bayesian regression , 2009 .

[89]  R. Gray,et al.  Calculation of polychotomous logistic regression parameters using individualized regressions , 1984 .

[90]  Purushottam W. Laud,et al.  Bayesian Nonparametric Inference for Random Distributions and Related Functions , 1999 .

[91]  J. Rosenthal,et al.  On adaptive Markov chain Monte Carlo algorithms , 2005 .

[92]  M. Borenstein Effect sizes for continuous data. , 2009 .

[93]  A. Brix Bayesian Data Analysis, 2nd edn , 2005 .

[94]  Andrew Gelman,et al.  Handbook of Markov Chain Monte Carlo , 2011 .

[95]  S. Walker,et al.  Adaptive-modal Bayesian nonparametric regression , 2012 .

[96]  D. Blei Bayesian Nonparametrics I , 2016 .

[97]  B. Silverman,et al.  Nonparametric Regression and Generalized Linear Models: A roughness penalty approach , 1993 .

[98]  S. MacEachern,et al.  An ANOVA Model for Dependent Random Measures , 2004 .

[99]  W. Gilks,et al.  Random-effects models, for longitudinal data using Gibbs sampling. , 1993, Biometrics.

[100]  C. Antoniak Mixtures of Dirichlet Processes with Applications to Bayesian Nonparametric Problems , 1974 .

[101]  R. Carter Hill Regression Discontinuity Designs , 2017 .

[102]  P. J. Green,et al.  Density Estimation for Statistics and Data Analysis , 1987 .

[103]  Maria De Iorio,et al.  Bayesian semiparametric inference for multivariate doubly-interval-censored data , 2010, 1101.1415.

[104]  A. W. Kemp,et al.  Kendall's Advanced Theory of Statistics. , 1994 .

[105]  P. McCullagh,et al.  Generalized Linear Models , 1984 .

[106]  J. Friedman Greedy function approximation: A gradient boosting machine. , 2001 .

[107]  J. Pitman Some developments of the Blackwell-MacQueen urn scheme , 1996 .

[108]  D. Burr,et al.  A Bayesian Semiparametric Model for Random-Effects Meta-Analysis , 2005 .

[109]  Raquel Prado Time series , 2009 .

[110]  C. S. Reichardt,et al.  Regression-discontinuity designs. , 2012 .

[111]  J. Pitman Exchangeable and partially exchangeable random partitions , 1995 .

[112]  J. Pitman,et al.  Size-biased sampling of Poisson point processes and excursions , 1992 .

[113]  A. Lijoi,et al.  On the stick-breaking representation of normalized inverse Gaussian priors , 2012 .

[114]  G. Karabatsos Fast Marginal Likelihood Estimation of the Ridge Parameter in Ridge Regression , 2015 .

[115]  R. Hambleton,et al.  Handbook of Modern Item Response Theory , 1997 .

[116]  H. Ishwaran,et al.  Markov chain Monte Carlo in approximate Dirichlet and beta two-parameter process hierarchical models , 2000 .

[117]  Robert A. Jacobs,et al.  Hierarchical Mixtures of Experts and the EM Algorithm , 1993, Neural Computation.

[118]  G. Molenberghs,et al.  Models for Discrete Longitudinal Data , 2005 .

[119]  S. Chib,et al.  Bayesian analysis of binary and polychotomous response data , 1993 .

[120]  Elizabeth A Stuart,et al.  Matching methods for causal inference: A review and a look forward. , 2010, Statistical science : a review journal of the Institute of Mathematical Statistics.

[121]  L. Hedges Distribution Theory for Glass's Estimator of Effect size and Related Estimators , 1981 .

[122]  S. Walker,et al.  A Bayesian nonparametric causal model for regression discontinuity designs , 2013, 1311.4482.

[123]  Robert Tibshirani,et al.  The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd Edition , 2001, Springer Series in Statistics.

[124]  R. Rosenthal Parametric measures of effect size. , 1994 .

[125]  Lancelot F. James,et al.  Posterior Analysis for Normalized Random Measures with Independent Increments , 2009 .

[126]  Christopher Holmes,et al.  Bayesian Methods for Nonlinear Classification and Regressing , 2002 .

[127]  Sylvia Richardson,et al.  PReMiuM: An R Package for Profile Regression Mixture Models Using Dirichlet Processes. , 2013, Journal of statistical software.

[128]  T. Rolski On random discrete distributions , 1980 .

[129]  P. Müller,et al.  Bayesian Nonparametrics: An invitation to Bayesian nonparametrics , 2010 .

[130]  J. Møller,et al.  Handbook of Spatial Statistics , 2008 .

[131]  S. Walker Invited comment on the paper "Slice Sampling" by Radford Neal , 2003 .

[132]  Peter Müller,et al.  A Simple Class of Bayesian Nonparametric Autoregression Models. , 2013, Bayesian analysis.

[133]  Rajeshwari Sundaram,et al.  Flexible Bayesian Human Fecundity Models. , 2012, Bayesian analysis.

[134]  D. Freedman,et al.  On the histogram as a density estimator:L2 theory , 1981 .

[135]  A. Pettitt,et al.  Introduction to MCMC , 2012 .

[136]  Alan E. Gelfand,et al.  Model choice: A minimum posterior predictive loss approach , 1998, AISTATS.

[137]  D. Rubin,et al.  Reducing Bias in Observational Studies Using Subclassification on the Propensity Score , 1984 .