A new look at the inverse Gaussian distribution with applications to insurance and economic data

ABSTRACT Insurance and economic data are often positive, and we need to take into account this peculiarity in choosing a statistical model for their distribution. An example is the inverse Gaussian (IG), which is one of the most famous and considered distributions with positive support. With the aim of increasing the use of the IG distribution on insurance and economic data, we propose a convenient mode-based parameterization yielding the reparametrized IG (rIG) distribution; it allows/simplifies the use of the IG distribution in various branches of statistics, and we give some examples. In nonparametric statistics, we define a smoother based on rIG kernels. By construction, the estimator is well-defined and does not allocate probability mass to unrealistic negative values. We adopt likelihood cross-validation to select the smoothing parameter. In robust statistics, we propose the contaminated IG distribution, a heavy-tailed generalization of the rIG distribution to accommodate mild outliers. Finally, for model-based clustering and semiparametric density estimation, we present finite mixtures of rIG distributions. We use the EM algorithm to obtain maximum likelihood estimates of the parameters of the mixture and contaminated models. We use insurance data about bodily injury claims, and economic data about incomes of Italian households, to illustrate the models.

[1]  Maria-Pia Victoria-Feser Robust Estimation of Personal Income Distribution Models , 1993 .

[2]  Antonio Punzo,et al.  Using the Variation Coefficient for Adaptive Discrete Beta Kernel Graduation , 2013, Statistical Models for Data Analysis.

[3]  M. Tweedie Statistical Properties of Inverse Gaussian Distributions. II , 1957 .

[4]  John C. Nash,et al.  On Best Practice Optimization Methods in R , 2014 .

[5]  G. Schwarz Estimating the Dimension of a Model , 1978 .

[6]  Ameli References Advanced Methodology for European Laeken Indicators , 2011 .

[7]  Michael P. Wiper,et al.  Mixtures of Gamma Distributions With Applications , 2001 .

[8]  Skew mixture models for loss distributions: a Bayesian approach , 2012 .

[9]  A. Basu,et al.  The Inverse Gaussian Distribution , 1993 .

[10]  F. Leisch,et al.  FlexMix Version 2: Finite Mixtures with Concomitant Variables and Varying and Constant Parameters , 2008 .

[11]  Peter Schlattmann,et al.  Medical Applications of Finite Mixture Models , 2009 .

[12]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[13]  M. Victoria-Feser,et al.  Welfare Rankings in the Presence of Contaminated Data , 2002 .

[14]  Lucio Barabesi,et al.  Modeling international trade data with the Tweedie distribution for anti-fraud and policy support , 2016, Eur. J. Oper. Res..

[15]  P. McNicholas Mixture Model-Based Classification , 2016 .

[16]  G Lewis,et al.  Income inequality and self rated health in Britain , 2002, Journal of epidemiology and community health.

[17]  Richard A. Derrig,et al.  Modeling Hidden Exposures in Claim Severity Via the Em Algorithm , 2005 .

[18]  Antonio Punzo,et al.  Graduation by Adaptive Discrete Beta Kernels , 2013, Classification and Data Mining.

[19]  Salvatore Ingrassia,et al.  Decision boundaries for mixtures of regressions , 2016 .

[20]  A. Raftery Bayesian Model Selection in Social Research , 1995 .

[21]  Antonello Maruotti,et al.  Model-based time-varying clustering of multivariate longitudinal data with covariates and outliers , 2017, Comput. Stat. Data Anal..

[22]  Thomas Lumley,et al.  AIC AND BIC FOR MODELING WITH COMPLEX SURVEY DATA , 2015 .

[23]  B. Lindsay,et al.  The distribution of the likelihood ratio for mixtures of densities from the one-parameter exponential family , 1994 .

[24]  M. Escobar,et al.  Bayesian Density Estimation and Inference Using Mixtures , 1995 .

[25]  Amartya Sen,et al.  Handbook of Income Inequality Measurement , 1999 .

[26]  Geoffrey J. McLachlan,et al.  Mixture models : inference and applications to clustering , 1989 .

[27]  Antonio Punzo,et al.  Bivariate discrete beta Kernel graduation of mortality data , 2015, Lifetime data analysis.

[28]  Antonio Punzo,et al.  DBKGrad: An R Package for Mortality Rates Graduation by Discrete Beta Kernel Techniques , 2014 .

[29]  Yongho Jeon,et al.  A gamma kernel density estimation for insurance loss data , 2013 .

[30]  Paul D. McNicholas,et al.  Cluster-weighted $$t$$t-factor analyzers for robust model-based clustering and dimension reduction , 2015, Stat. Methods Appl..

[31]  Russell C. H. Cheng,et al.  Maximum likelihood Estimation of Parameters in the Inverse Gaussian Distribution, With Unknown Origin , 1981 .

[32]  Song-xi Chen,et al.  Probability Density Function Estimation Using Gamma Kernels , 2000 .

[33]  Antonello Maruotti,et al.  Handling endogeneity and nonnegativity in correlated random effects models: Evidence from ambulatory expenditure , 2016, Biometrical journal. Biometrische Zeitschrift.

[34]  R. K. Amoh,et al.  Estimation of parameters in mixtures of inverse gaussian distributions , 1984 .

[35]  John P. Nolan,et al.  Parameterizations and modes of stable distributions , 1998 .

[36]  Akimichi Takemura,et al.  Strong consistency of the maximum likelihood estimator for finite mixtures of location-scale distributions when the scale parameters are exponentially small , 2006 .

[37]  Francisco Javier Blanco-Encomienda,et al.  The Effect of Outliers on the Economic and Social Survey on Income and Living Conditions , 2014 .

[38]  Bettina Grün,et al.  Modeling loss data using mixtures of distributions , 2016 .

[39]  Christine Keribiin,et al.  Estimation consistante de l'ordre de modèles de mélange , 1998 .

[40]  L. Bagnato,et al.  The multivariate leptokurtic‐normal distribution and its application in model‐based clustering , 2017 .

[41]  Robert P. W. Duin,et al.  On the Choice of Smoothing Parameters for Parzen Estimators of Probability Density Functions , 1976, IEEE Transactions on Computers.

[42]  Antonio Punzo,et al.  Discrete Beta Kernel Graduation of Age-Specific Demographic Indicators , 2011 .

[43]  N. L. Johnson,et al.  Continuous Univariate Distributions. , 1995 .

[44]  Frank J. Fabozzi,et al.  Financial Models with Levy Processes and Volatility Clustering , 2011 .

[45]  Alan Julian Izenman,et al.  Modern Multivariate Statistical Techniques: Regression, Classification, and Manifold Learning , 2008 .

[46]  Miguel Á. Carreira-Perpiñán,et al.  Mode-Finding for Mixtures of Gaussian Distributions , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[47]  P. Deb Finite Mixture Models , 2008 .

[48]  John K Kruschke,et al.  Bayesian data analysis. , 2010, Wiley interdisciplinary reviews. Cognitive science.

[49]  A. C. Aitken III.—A Series Formula for the Roots of Algebraic and Transcendental Equations , 1926 .

[50]  R. Hathaway A constrained EM algorithm for univariate normal mixtures , 1986 .

[51]  New York Dover,et al.  ON THE CONVERGENCE PROPERTIES OF THE EM ALGORITHM , 1983 .

[52]  Lucio Barabesi,et al.  A new family of tempered distributions , 2016 .

[53]  Johan Walden,et al.  Heavy-Tailed Distributions and Robustness in Economics and Finance , 2015 .

[54]  Luc Devroye,et al.  On simulation and properties of the stable law , 2014, Stat. Methods Appl..

[55]  Paul D. McNicholas,et al.  Model-Based Clustering , 2016, Journal of Classification.

[56]  P. Nurmi Mixture Models , 2008 .

[57]  Paul D. McNicholas,et al.  Robust Clustering in Regression Analysis via the Contaminated Gaussian Cluster-Weighted Model , 2014, J. Classif..

[58]  Ryan P. Browne,et al.  Multivariate Response and Parsimony for Gaussian Cluster-Weighted Models , 2014, Journal of Classification.

[59]  A. Raftery,et al.  Detecting features in spatial point processes with clutter via model-based clustering , 1998 .

[60]  Alex Pentland,et al.  Probabilistic Visual Learning for Object Representation , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[61]  Sylvia Frühwirth-Schnatter,et al.  Finite Mixture and Markov Switching Models , 2006 .

[62]  Eric R. Ziegel,et al.  Statistical Size Distributions in Economics and Actuarial Sciences , 2004, Technometrics.

[63]  J. P. Park The Identification Of Multiple Outliers , 2000 .

[64]  M. Stone Cross‐Validatory Choice and Assessment of Statistical Predictions , 1976 .

[65]  L. Wasserman,et al.  Bayesian analysis of outlier problems using the Gibbs sampler , 1991 .

[66]  N. E. Day Estimating the components of a mixture of normal distributions , 1969 .

[67]  Peter M. Bentler,et al.  Estimation of Contamination Parameters and Identification of Outliers in Multivariate Data , 1988 .

[68]  W. DeSarbo,et al.  A maximum likelihood methodology for clusterwise linear regression , 1988 .

[69]  Maria-Pia Victoria-Feser,et al.  Robustness properties of inequality measures , 1996 .

[70]  R. Hathaway A Constrained Formulation of Maximum-Likelihood Estimation for Normal Mixture Distributions , 1985 .

[71]  X. Sheldon Lin,et al.  Modeling and Evaluating Insurance Losses Via Mixtures of Erlang Distributions , 2010 .

[72]  Maurizio Vichi,et al.  Studies in Classification Data Analysis and knowledge Organization , 2011 .

[73]  Christian Hennig,et al.  Fixed Point Clusters for Linear Regression: Computation and Comparison , 2002, J. Classif..

[74]  Jon S. Horne,et al.  Likelihood Cross-Validation Versus Least Squares Cross-Validation for Choosing the Smoothing Parameter in Kernel Home-Range Analysis , 2006 .

[75]  M. Aitkin,et al.  Mixture Models, Outliers, and the EM Algorithm , 1980 .

[76]  S. T. Boris Choy,et al.  Scale Mixtures Distributions in Insurance Applications , 2003, ASTIN Bulletin.

[78]  Lorenzo Fattorini,et al.  The stochastic interpretation of the Dagum personal income distribution: a tale , 2006 .

[79]  Antonio Punzo,et al.  Discrete Beta-Type Models , 2010 .

[80]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[81]  P. McNicholas,et al.  Outlier Detection via Parsimonious Mixtures of Contaminated Gaussian Distributions , 2013 .

[82]  P. J. Green,et al.  Density Estimation for Statistics and Data Analysis , 1987 .

[83]  S. Dibb Market Segmentation: Conceptual and Methodological Foundations (2nd edition) , 2000 .

[84]  A. F. Smith,et al.  Statistical analysis of finite mixture distributions , 1986 .

[85]  Stuart A. Klugman,et al.  Loss Models: From Data to Decisions , 1998 .

[86]  E. Murphy,et al.  ONE CAUSE?MANY CAUSES?THE ARGUMENT FROM THE BIMODAL DISTRIBUTION. , 1964, Journal of chronic diseases.

[87]  A. Maruotti,et al.  Clustering Multivariate Longitudinal Observations: The Contaminated Gaussian Hidden Markov Model , 2016 .

[88]  W. Yao,et al.  A New Regression Model: Modal Linear Regression , 2014 .

[89]  C. Dagum,et al.  A new model of personal income distribution : specification and estimation , 1977, Économie appliquée.

[90]  Antonio Punzo,et al.  Finite mixtures of unimodal beta and gamma densities and the \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$k$$\end{d , 2012, Computational Statistics.

[91]  R. S. J. Sparks,et al.  Bimodal grain size distribution and secondary thickening in air-fall ash layers , 1983, Nature.

[92]  J. Idier,et al.  Penalized Maximum Likelihood Estimator for Normal Mixtures , 2000 .

[93]  S. T. Boris Choy,et al.  Robust Bayesian analysis of loss reserving data using scale mixtures distributions , 2015 .

[94]  Antonello Maruotti,et al.  Fitting insurance and economic data with outliers: a flexible approach based on finite mixtures of contaminated gamma distributions , 2018 .

[95]  M. Grabchak Tempered Stable Distributions: Stochastic Models for Multiscale Processes , 2016 .

[96]  Antonio Punzo,et al.  Discrete approximations of continuous and mixed measures on a compact interval , 2012 .

[97]  Giovanni Parmigiani,et al.  GAMMA SHAPE MIXTURES FOR HEAVY-TAILED DISTRIBUTIONS , 2008, 0807.4663.

[98]  J. L. Folks,et al.  The Inverse Gaussian Distribution and its Statistical Application—A Review , 1978 .

[99]  William E. Griffiths,et al.  Estimating Income Distributions Using a Mixture of Gamma Densities , 2008 .

[100]  Matthew P. Wand,et al.  Kernel Smoothing , 1995 .

[101]  Adrian E. Raftery,et al.  Model-Based Clustering, Discriminant Analysis, and Density Estimation , 2002 .

[102]  J. Leroy Folks,et al.  The Inverse Gaussian Distribution: Theory: Methodology, and Applications , 1988 .

[103]  Song-xi Chen,et al.  Beta kernel estimators for density functions , 1999 .

[104]  G. Ritter Robust Cluster Analysis and Variable Selection , 2014 .

[105]  Paul D. McNicholas,et al.  Clustering and classification via cluster-weighted factor analyzers , 2012, Advances in Data Analysis and Classification.

[106]  M. Stone An Asymptotic Equivalence of Choice of Model by Cross‐Validation and Akaike's Criterion , 1977 .

[107]  Lynn Kuo,et al.  A Bayesian predictive approach to determining the number of components in a mixture distribution , 1995 .

[108]  Miguel Á. Carreira-Perpiñán,et al.  Reconstruction of Sequential Data with Probabilistic Models and Continuity Constraints , 1999, NIPS.

[109]  Paul D. McNicholas,et al.  ContaminatedMixt: An R Package for Fitting Parsimonious Mixtures of Multivariate Contaminated Normal Distributions , 2016, 1606.03766.

[110]  Alan Julian Izenman,et al.  Modern Multivariate Statistical Techniques , 2008 .

[111]  Aldi J. M. Hagenaars,et al.  The Influence of Classification and Observation Errors on the Measurement of Income Inequality , 1983 .

[112]  Karl Mosler,et al.  A Cautionary Note on Likelihood Ratio Tests in Mixture Models , 2000 .