Efficient Estimation of Smooth Distributions From Coarsely Grouped Data

Ungrouping binned data can be desirable for many reasons: Bins can be too coarse to allow for accurate analysis; comparisons can be hindered when different grouping approaches are used in different histograms; and the last interval is often wide and open-ended and, thus, covers a lot of information in the tail area. Age group–specific disease incidence rates and abridged life tables are examples of binned data. We propose a versatile method for ungrouping histograms that assumes that only the underlying distribution is smooth. Because of this modest assumption, the approach is suitable for most applications. The method is based on the composite link model, with a penalty added to ensure the smoothness of the target distribution. Estimates are obtained by maximizing a penalized likelihood. This maximization is performed efficiently by a version of the iteratively reweighted least-squares algorithm. Optimal values of the smoothing parameter are chosen by minimizing Akaike's Information Criterion. We demonstrate the performance of this method in a simulation study and provide several examples that illustrate the approach. Wide, open-ended intervals can be handled properly. The method can be extended to the estimation of rates when both the event counts and the exposures to risk are grouped.

[1]  Paul H. C. Eilers,et al.  Flexible smoothing with B-splines and penalties , 1996 .

[2]  N. Grassly,et al.  United Nations Department of Economic and Social Affairs/population Division , 2022 .

[3]  T. Duchesne,et al.  Local likelihood density estimation for interval censored data , 2005 .

[4]  Alan Y. Chiang,et al.  Generalized Additive Models: An Introduction With R , 2007, Technometrics.

[5]  N. L. Johnson,et al.  Survival Models and Data Analysis , 1982 .

[6]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[7]  R. Tibshirani,et al.  Generalized Additive Models , 1986 .

[8]  Eric R. Ziegel,et al.  Generalized Linear Models , 2002, Technometrics.

[9]  S. Wood Generalized Additive Models: An Introduction with R , 2006 .

[10]  Philippe Lambert,et al.  Smooth semiparametric and nonparametric Bayesian estimation of bivariate densities from bivariate histogram data , 2011, Comput. Stat. Data Anal..

[11]  Alexander Y. Gordon,et al.  Iterated conditional expectations , 2010 .

[12]  Stephen E. Fienberg,et al.  Discrete Multivariate Analysis: Theory and Practice , 1976 .

[13]  Gordon Blower,et al.  Nonlinear kernel density estimation for binned data: convergence in entropy , 2002 .

[14]  David G. Kendall,et al.  Spline Transformations: Three New Diagnostic Aids for the Statistical Data‐Analyst , 1971 .

[15]  F. Paccaud,et al.  Age at death and rectangularisation of the survival curve: trends in Switzerland, 1969-1994. , 1998, Journal of epidemiology and community health.

[16]  Robin Thompson,et al.  Composite Link Functions in Generalized Linear Models , 1981 .

[17]  Paul H. C. Eilers,et al.  Bayesian density estimation from grouped continuous data , 2009, Comput. Stat. Data Anal..

[18]  Paul H. C. Eilers,et al.  Ill-posed problems with counts, the composite link model and penalized likelihood , 2007 .

[19]  Construction of expanded continuous life tables--a generalization of abridged and complete life tables. , 1991, Mathematical biosciences.

[20]  Hans Reihling,et al.  Expanding an abridged life table , 2001 .