HETEROGENEITY AND MODEL UNCERTAINTY IN BAYESIANREGRESSION

Data heterogeneity appears when the sample comes from at least two diierent populations. We analyze three types of situations. In the rst and simplest case the majority of the data come from a central model and a few isolated observations come from a contaminating distribution. The data from the contaminating distribution are called outliers and they have been studied in depth in the statistical literature. In the second case we still have a central model but the heterogeneous data may appear in clusters of outliers which mask each other. This is the multiple outlier problem which is much more diicult to handle and it has been analyzed and understood in the last few years. The few Bayesian contributions to this problem are presented. In the third case we do not have a central model but instead diierent groups of data have been generated by diierent models. For multivariate normal this problem has been analyzed by mixture models under the name of cluster analysis, but a challenging area of research is to develop a general methodology for applying this multiple model approach to other statistical problems. Heterogeneity implies in general an increase in the uncertainty of predictions, and we present in this paper a procedure to measure this eeect. 1 We acknowledge support for this work from DGES, A. Justel from Grant PB97-0021 and D. Pe~ na from grant 96-0111.

[1]  A. F. Smith,et al.  Statistical analysis of finite mixture distributions , 1986 .

[2]  L. Wasserman,et al.  Bayesian analysis of outlier problems using the Gibbs sampler , 1991 .

[3]  J. Simonoff,et al.  Procedures for the Identification of Multiple Outliers in Linear Models , 1993 .

[4]  L. Tierney,et al.  Approximate methods for assessing influence and sensitivity in Bayesian analysis , 1989 .

[5]  Daniel Peña,et al.  Gibbs Sampling Will Fail in Outlier Problems with Strong Masking , 1996 .

[6]  J. Tukey A survey of sampling from contaminated distributions , 1960 .

[7]  Padhraic Smyth,et al.  From Data Mining to Knowledge Discovery: An Overview , 1996, Advances in Knowledge Discovery and Data Mining.

[8]  A. Zellner An Introduction to Bayesian Inference in Econometrics , 1971 .

[9]  Geoffrey J. McLachlan,et al.  Mixture models : inference and applications to clustering , 1989 .

[10]  L. Pettit The Conditional Predictive Ordinate for the Normal Distribution , 1990 .

[11]  G. C. Tiao,et al.  A bayesian approach to some outlier problems. , 1968, Biometrika.

[12]  B. Carlin An Expected Utility Approach to Influence Diagnostics , 1991 .

[13]  Barnes Discussion of the Paper , 1961, Public health papers and reports.

[14]  A. Justel,et al.  Bayesian unmasking in linear models , 2001 .

[15]  Comparing probabilistic methods for outlier detection , 1992 .

[16]  Adrian E. Raftery,et al.  Hypothesis testing and model selection , 1996 .

[17]  David J. Hand,et al.  Data Mining: Statistics and More? , 1998 .

[18]  A. Atkinson Fast Very Robust Methods for the Detection of Multiple Outliers , 1994 .

[19]  I. Guttman,et al.  Comparing probabilistic methods for outlier detection in linear models , 1993 .

[20]  L. I. Pettit,et al.  Bayes Factors for Outlier Models Using the Device of Imaginary Observations , 1992 .

[21]  C. Robert,et al.  Estimation of Finite Mixture Distributions Through Bayesian Sampling , 1994 .

[22]  Brian D. Ripley,et al.  Stochastic Simulation , 2005 .

[23]  R. A. Leibler,et al.  On Information and Sufficiency , 1951 .

[24]  M. Aitkin,et al.  Mixture Models, Outliers, and the EM Algorithm , 1980 .

[25]  S. Geisser,et al.  A Predictive View of the Detection and Characterization of Influential Observations in Regression Analysis , 1983 .

[26]  Irwin Guttman,et al.  A Bayesian look at diagnostics in the univariate linear model , 1992 .

[27]  P. Rousseeuw Least Median of Squares Regression , 1984 .

[28]  Seymour Geisser,et al.  Estimative influence measures for the multivariate general linear model , 1985 .

[29]  D. Spiegelhalter,et al.  Bayes Factors for Linear and Log‐Linear Models with Vague Prior Information , 1982 .

[30]  K. Pearson Biometrika , 1902, The American Naturalist.

[31]  Bovas Abraham,et al.  Linear Models and Spurious Observations , 1978 .

[32]  P. Green,et al.  On Bayesian Analysis of Mixtures with an Unknown Number of Components (with discussion) , 1997 .

[33]  M. Kendall,et al.  Kendall's advanced theory of statistics , 1995 .

[34]  S. Stigler Simon Newcomb, Percy Daniell, and the History of Robust Estimation 1885–1920 , 1972 .

[35]  P. Kempthorne Decision-theoretic Measures of Influence in Regression , 1986 .

[36]  George E. P. Box,et al.  Sampling and Bayes' inference in scientific modelling and robustness , 1980 .

[37]  Rudolf Dutter,et al.  Care and Handling of Univariate Outliers in the General Linear Model to Detect Spuriosity- A , 1978 .

[38]  P. Green Reversible jump Markov chain Monte Carlo computation and Bayesian model determination , 1995 .

[39]  D. Lindley,et al.  Bayes Estimates for the Linear Model , 1972 .

[40]  P. Freeman On the number of outliers in data from a linear model , 1980 .

[41]  Stephen M. Stigler,et al.  The History of Statistics: The Measurement of Uncertainty before 1900 , 1986 .

[42]  M. West Outlier Models and Prior Distributions in Bayesian Linear Regression , 1984 .

[43]  G. C. Tiao,et al.  Bayesian inference in statistical analysis , 1973 .

[44]  J. Bernardo,et al.  A Bayesian approach to cluster analysis , 1988 .

[45]  M. West,et al.  A Bayesian method for classification and discrimination , 1992 .

[46]  P. Rousseeuw,et al.  Unmasking Multivariate Outliers and Leverage Points , 1990 .

[47]  P. Green,et al.  Corrigendum: On Bayesian analysis of mixtures with an unknown number of components , 1997 .

[48]  José M. Bernardo Optimizing Prediction with Hierarchical Models: Bayesian Clustering , 2001 .

[49]  Adrian F. M. Smith,et al.  Sampling-Based Approaches to Calculating Marginal Densities , 1990 .

[50]  I. Guttman Care and Handling of Univariate or Multivariate Outliners in Detecting Spuriosity—a Bayesian Approach , 1973 .

[51]  L. Devroye Non-Uniform Random Variate Generation , 1986 .

[52]  M. Steel,et al.  On Bayesian Modelling of Fat Tails and Skewness , 1998 .