A comparison of efficient approximations for a weighted sum of chi-squared random variables

In many applications, the cumulative distribution function (cdf) $$F_{Q_N}$$FQN of a positively weighted sum of N i.i.d. chi-squared random variables $$Q_N$$QN is required. Although there is no known closed-form solution for $$F_{Q_N}$$FQN, there are many good approximations. When computational efficiency is not an issue, Imhof’s method provides a good solution. However, when both the accuracy of the approximation and the speed of its computation are a concern, there is no clear preferred choice. Previous comparisons between approximate methods could be considered insufficient. Furthermore, in streaming data applications where the computation needs to be both sequential and efficient, only a few of the available methods may be suitable. Streaming data problems are becoming ubiquitous and provide the motivation for this paper. We develop a framework to enable a much more extensive comparison between approximate methods for computing the cdf of weighted sums of an arbitrary random variable. Utilising this framework, a new and comprehensive analysis of four efficient approximate methods for computing $$F_{Q_N}$$FQN is performed. This analysis procedure is much more thorough and statistically valid than previous approaches described in the literature. A surprising result of this analysis is that the accuracy of these approximate methods increases with N.

[1]  David S. Moore,et al.  Unified Large-Sample Theory of General Chi-Squared Statistics for Tests of Fit , 1975 .

[2]  J. Macgregor,et al.  The exponentially weighted moving variance , 1993 .

[3]  Bodhini R. Jayasuriya,et al.  Testing for Polynomial Regression Using Nonparametric Regression Techniques , 1996 .

[4]  Michael Buckley,et al.  AN APPROXIMATION TO THE DISTRIBUTION OF QUADRATIC FORMS IN NORMAL RANDOM VARIABLES , 1988 .

[5]  A. Castaño-Martínez,et al.  Distribution of a sum of weighted noncentral chi-square variables , 2005 .

[6]  Jesús S. Aguilar-Ruiz,et al.  Knowledge discovery from data streams , 2009, Intell. Data Anal..

[7]  Cherng G. Ding Computing the non-central χ2 distribution function , 1992 .

[8]  P. Bentler,et al.  Corrections to test statistics in principal Hessian directions , 2000 .

[9]  Peter Hall Chi Squared Approximations to the Distribution of a Sum of Independent Random Variables , 1983 .

[10]  A. Castaño-Martínez,et al.  Distribution of a Sum of Weighted Central Chi-Square Variables , 2005 .

[11]  N. L. Johnson,et al.  Continuous Univariate Distributions. , 1995 .

[12]  B. Everitt Introduction to Optimization Methods and their Application in Statistics , 1987 .

[13]  H. Robbins,et al.  Application of the Method of Mixtures to Quadratic Forms in Normal Variates , 1949 .

[14]  Bruce G. Lindsay,et al.  Moment-Based Approximations of Distributions Using Mixtures: Theory and Applications , 2000 .

[15]  P. Patnaik The Non-central X^2- and F- distribution and Their Applications , 1949 .

[16]  A. Wood An F Approximation to the Distribution of a Linear Combination of Chi-squared Variables. , 1989 .

[17]  B. L. Welch THE SIGNIFICANCE OF THE DIFFERENCE BETWEEN TWO MEANS WHEN THE POPULATION VARIANCES ARE UNEQUAL , 1938 .

[18]  A. W. Davis A Differential Equation Approach to Linear Combinations of Independent Chi-Squares , 1977 .

[19]  Vipin Kumar,et al.  Chapman & Hall/CRC Data Mining and Knowledge Discovery Series , 2008 .

[20]  R. Farebrother The Distribution of a Positive Linear Combination of X2 Random Variables , 1984 .

[21]  T. Pham-Gia,et al.  The generalized beta- and F-distributions in statistical modelling , 1989 .

[22]  H. Fairfield Smith,et al.  The problem of comparing the result of two experiments with unequal errors , 1936 .

[23]  N. L. Johnson,et al.  Continuous Multivariate Distributions, Volume 1: Models and Applications , 2019 .

[24]  Pierre Lafaye de Micheaux,et al.  Computing the distribution of quadratic forms: Further comparisons between the Liu-Tang-Zhang approximation and exact methods , 2010, Comput. Stat. Data Anal..

[25]  Jin-Ting Zhang,et al.  Statistical inferences for functional data , 2007, 0708.2207.

[26]  P. Patnaik THE NON-CENTRAL χ2- AND F-DISTRIBUTIONS AND THEIR APPLICATIONS , 1949 .

[27]  Dean Adam Bodenham,et al.  Adaptive estimation with change detection for streaming data , 2014 .

[28]  H. Solomon,et al.  Distribution of a Sum of Weighted Chi-Square Variables , 1977 .

[29]  R. Davies The distribution of a linear combination of 2 random variables , 1980 .

[30]  F. E. Satterthwaite An approximate distribution of estimates of variance components. , 1946, Biometrics.

[31]  A. Spitzbart,et al.  Inverses of Vandermonde Matrices , 1958 .

[32]  Spiridon Penev,et al.  A Wiener Germ approximation of the noncentral chi square distribution and of its quantiles , 2000, Comput. Stat..

[33]  J. Imhof Computing the distribution of quadratic forms in normal variables , 1961 .

[34]  Connie M. Borror,et al.  EWMA techniques for computer intrusion detection through anomalous changes in event intensity , 2002 .

[35]  J. G. Saw,et al.  Chebyshev Inequality With Estimated Mean and Variance , 1984 .

[36]  Hadley Wickham,et al.  ggplot2 - Elegant Graphics for Data Analysis (2nd Edition) , 2017 .

[37]  R. W. Farebrother,et al.  The Distribution of a Noncentral χ2 Variable with Nonnegative Degrees of Freedom , 1987 .

[38]  J. Sheil,et al.  The Distribution of Non‐Negative Quadratic Forms in Normal Variables , 1977 .

[39]  G. Box Some Theorems on Quadratic Forms Applied in the Study of Analysis of Variance Problems, I. Effect of Inequality of Variance in the One-Way Classification , 1954 .

[40]  Satterthwaite Fe An approximate distribution of estimates of variance components. , 1946 .

[41]  D. R. Jensen,et al.  A Gaussian Approximation to the Distribution of a Definite Quadratic Form , 1972 .

[42]  Niall M. Adams,et al.  Continuous Monitoring of a Computer Network Using Multivariate Adaptive Estimation , 2013, 2013 IEEE 13th International Conference on Data Mining Workshops.

[43]  J. Uspensky,et al.  Introduction to Mathematical Probability , 1938, Nature.