Weighted quantile-based estimation for a class of transformation distributions

Quantile-based methods appear in both statistical inference and exploratory data analysis. Inferential methods based on order statistics generally have extensive theoretical bases, while exploratory data analysis tends to emphasize graphical methods and often uses selected sets of quantiles such as the "letter-values" of Tukey (Exploratory Data Analysis, Addison-Wesley, Reading, MA, 1977b). Since transformations of random variables give rise to families of distributions defined through their quantile functions, quantile-based methods could be considered a natural approach when using such families. This paper considers quantile-based methods for fitting two such families of distributions (formed by transforming the standard normal), the g-and-k and the adapted g-and-h distributions, which have been developed to take advantage of certain shape functionals.The effects of different quantiles are taken into account by considering weighted sums of estimates based on quantiles within the data, these sets of estimates arising from matching shape, location and scale functionals. The methods considered correspond to different criteria for the weighted sums. These iteratively reweighted methods use approximations to means and variances of the functionals, and so not only produce parameter estimates, but also approximations of the mean and variance for these estimates, and weights which indicate which functionals of the quantiles of the data are found to be most important. A simulation study is included, and the procedures, distributions and approximations are also illustrated by fitting two air pollution datasets.Comparisons are made with a quick method that uses the median of the set of estimates, and with numerical maximum likelihood estimation which tends to be not efficient for these families until very large sample sizes are available (Rayner and MacGillivray, 2002). The results indicate that the weighted methods perform better in a number of ways than numerical maximum likelihood estimation for smaller and moderately-sized samples. MATLAB software to carry out the weighted method is available on request.

[1]  Regina Y. Liu,et al.  Multivariate analysis by data depth: descriptive statistics, graphics and inference, (with discussion and a rejoinder by Liu and Singh) , 1999 .

[2]  P. Bickel,et al.  DESCRIPTIVE STATISTICS FOR NONPARAMETRIC MODELS IV. SPREAD , 1979 .

[3]  H. L. Mac Gillivray,et al.  Shape properties of the g-and-h and johnson families , 1992 .

[4]  J. Jacquelin,et al.  A reliable algorithm for the exact median rank function , 1993 .

[5]  H. L. Macgillivray,et al.  THE RELATIONSHIPS BETWEEN SKEWNESS AND KURTOSIS , 1988 .

[6]  Kjell A. Doksum,et al.  Measures of Location and Asymmetry , 1975 .

[7]  Marshall Freimer,et al.  a study of the generalized tukey lambda family , 1988 .

[8]  K. Mengersen,et al.  Robustness of ranking and selection rules using generalised g-and-k distributions , 1997 .

[9]  Kevin P. Balanda,et al.  Kurtosis and spread , 1990 .

[10]  Bruce W. Schmeiser,et al.  An approximate method for generating symmetric random variables , 1972, CACM.

[11]  M. J. Lawrence Inequalities of $s$-Ordered Distributions , 1975 .

[12]  S. Shapiro,et al.  An Analysis of Variance Test for Normality (Complete Samples) , 1965 .

[13]  Robert E. Wheeler,et al.  Quantile estimators of Johnson curve parameters , 1980 .

[14]  A. Öztürk,et al.  Least Squares Estimation of the Parameters of the Generalized Lambda Distribution , 1985 .

[15]  John S. Ramberg,et al.  Fitting a distribution to data using an alternative to moments , 1979, WSC '79.

[16]  N. L. Johnson,et al.  Systems of frequency curves generated by methods of translation. , 1949, Biometrika.

[17]  Frederick Mosteller,et al.  Exploring Data Tables, Trends and Shapes. , 1986 .

[18]  James B. McDonald,et al.  Model selection: some generalized distributions , 1987 .

[19]  V. Zwet Convex transformations of random variables , 1965 .

[20]  G. D. Rayner,et al.  Numerical maximum likelihood estimation for the g-and-k and generalized g-and-h distributions , 2002, Stat. Comput..

[21]  John C. Fothergill,et al.  Estimating the cumulative probability of failure data points to be plotted on Weibull and other probability paper , 1990 .

[22]  Pandu R. Tadikamalla,et al.  Systems of frequency curves generated by transformations of logistic variables , 1982 .

[23]  Martinez Jorge,et al.  Some properties of the tukey g and h family of distributions , 1984 .

[24]  John W. Tukey,et al.  Fitting Quantiles: Doubling, HR, HQ, and HHH Distributions , 2000 .

[25]  H. L. MacGillivray,et al.  Theory & Methods: A Starship Estimation Method for the Generalized λ Distributions , 1999 .

[26]  Richard A. Groeneveld,et al.  Measuring Skewness and Kurtosis , 1984 .

[27]  S. Shapiro,et al.  THE JOHNSON SYSTEM: SELECTION AND PARAMETER ESTIMATION , 1980 .

[28]  H. L. MacGillivray,et al.  Skewness and Asymmetry: Measures and Orderings , 1986 .

[29]  N. L. Johnson Systems of frequency curves derived from the first law of Laplace , 1955 .

[30]  Pandu R. Tadikamalla,et al.  A Probability Distribution and its Uses in Fitting Data , 1979 .

[31]  M. A. Shayib,et al.  The Procedure for Selection of Transformations from the Johnson System , 1989 .

[32]  Edward J. Dudewicz,et al.  The extended generalized lambda distribution (EGLD) system for fitting distributions to data with moments, II: tables , 1996 .

[33]  E. L. Lehmann,et al.  Descriptive Statistics for Nonparametric Models II. Location , 1975 .

[34]  P. Bickel,et al.  Descriptive Statistics for Nonparametric Models. III. Dispersion , 1976 .

[35]  Wei-Yin Loh,et al.  Bounds on AREs for Restricted Classes of Distributions Defined Via Tail-Orderings , 1984 .