The Lambert Way to Gaussianize heavy tailed data with the inverse of Tukey's h as a special case

I present a parametric, bijective transformation to generate heavy tail versions Y of arbitrary RVs X ~ F. The tail behavior of the so-called 'heavy tail Lambert W x F' RV Y depends on a tail parameter delta >= 0: for delta = 0, Y = X, for delta > 0 Y has heavier tails than X. For X being Gaussian, this meta-family of heavy-tailed distributions reduces to Tukey's h distribution. Lambert's W function provides an explicit inverse transformation, which can be estimated by maximum likelihood. This inverse can remove heavy tails from data, and also provide analytical expressions for the cumulative distribution (cdf) and probability density function (pdf). As a special case, these yield explicit formulas for Tukey's h pdf and cdf - to the author's knowledge for the first time in the literature. Simulations and applications to S&P 500 log-returns and solar flares data demonstrate the usefulness of the introduced methodology. The R package "LambertW" (cran.r-project.org/web/packages/LambertW) implementing the presented methodology is publicly available at CRAN.

[1]  R. Engle Autoregressive conditional heteroscedasticity with estimates of the variance of United Kingdom inflation , 1982 .

[2]  Jenq-Neng Hwang,et al.  Nonparametric multivariate density estimation: a comparative study , 1994, IEEE Trans. Signal Process..

[3]  Mikael Gidlund,et al.  Scheduling Performance of Heavy-Tailed Data Traffic in Wireless High-Speed Shared Channels , 2009, 2009 IEEE Wireless Communications and Networking Conference.

[4]  V. Pipiras,et al.  Estimation of parameters in heavy-tailed distribution when its second order tail parameter is known , 2010 .

[5]  Rick S. Blum,et al.  On the Approximation of Correlated Non-Gaussian Noise Pdfs using Gaussian Mixture Models , 1999 .

[6]  A. Pakes Lambert's W, infinite divisibility and Poisson mixtures , 2011 .

[7]  N. L. Johnson,et al.  Systems of Frequency Curves , 1969 .

[8]  H. White,et al.  On More Robust Estimation of Skewness and Kurtosis: Simulation and Application to the S&P500 Index , 2003 .

[9]  L. Wasserman All of Nonparametric Statistics , 2005 .

[10]  David C. Hoaglin,et al.  Summarizing Shape Numerically: The g‐and‐h Distributions , 2011 .

[11]  M. Nezafat,et al.  A new enhanced morphological filter and signal recovery , 2001, EUROCON'2001. International Conference on Trends in Communications. Technical Program, Proceedings (Cat. No.01EX439).

[12]  T. Bollerslev,et al.  Generalized autoregressive conditional heteroskedasticity , 1986 .

[13]  Teruko Takada,et al.  Nonparametric density estimation: A comparative study , 2001 .

[14]  C. Granger,et al.  AN INTRODUCTION TO LONG‐MEMORY TIME SERIES MODELS AND FRACTIONAL DIFFERENCING , 1980 .

[15]  Marc G. Genton,et al.  The Multivariate g-and-h Distribution , 2006, Technometrics.

[16]  Raisa E. Feldman,et al.  Optimal Filtering of a Gaussian Signal in the Presence of Lévy Noise , 1999, SIAM J. Appl. Math..

[17]  Richard A. Davis,et al.  Time Series: Theory and Methods , 2013 .

[18]  W. Palma,et al.  Fitting non-Gaussian persistent data , 2011 .

[19]  N. Markovich Accuracy of transformed kernel density estimates for a heavy-tailed distribution , 2005 .

[20]  Alin Achim,et al.  SAR image denoising via Bayesian wavelet shrinkage based on heavy-tailed modeling , 2003, IEEE Trans. Geosci. Remote. Sens..

[21]  Pedro Jodrá,et al.  A closed-form expression for the quantile function of the Gompertz-Makeham distribution , 2009, Math. Comput. Simul..

[22]  Peter A. Zadrozny Necessary and Sufficient Restrictions for Existence of a Unique Fourth Moment of a Univariate Garch(P,Q) Process , 2005, SSRN Electronic Journal.

[23]  D. Rubin,et al.  ML ESTIMATION OF THE t DISTRIBUTION USING EM AND ITS EXTENSIONS, ECM AND ECME , 1999 .

[24]  Bernard Friedland,et al.  Control System Design: An Introduction to State-Space Methods , 1987 .

[25]  G. Gonnet,et al.  On Lambert's W Function , 1993 .

[26]  Kenneth E. Barner,et al.  Second-order heavy-tailed distributions and tail analysis , 2006, IEEE Transactions on Signal Processing.

[27]  Larry A. Wasserman,et al.  The Nonparanormal: Semiparametric Estimation of High Dimensional Undirected Graphs , 2009, J. Mach. Learn. Res..

[28]  M. Aschwanden The State of Self-organized Criticality of the Sun During the Last Three Solar Cycles. I. Observations , 2010, 1006.4861.

[29]  Emilija Nikolić-Đorić,et al.  On measuring skewness and kurtosis , 2009 .

[30]  Albert-László Barabási,et al.  Modeling bursts and heavy tails in human dynamics , 2005, Physical review. E, Statistical, nonlinear, and soft matter physics.

[31]  Jun Yan Asymmetry, Fat-tail, and Autoregressive Conditional Density in Financial Return Data with Systems of Frequency Curves , 2005 .

[32]  A. Azzalini,et al.  Distributions generated by perturbation of symmetry with emphasis on a multivariate skew t‐distribution , 2003, 0911.2342.

[33]  Wai-Sum Chan,et al.  A Student t-mixture autoregressive model with applications to heavy-tailed financial data , 2009 .

[34]  M. Newman Power laws, Pareto distributions and Zipf's law , 2005 .

[35]  V. Smith Least Squares Regression with Cauchy Errors , 2009 .

[36]  M. Steel,et al.  Multivariate Student -t Regression Models : Pitfalls and Inference , 1999 .

[37]  Alan H. Welsh,et al.  The emperor's new clothes: a critique of the multivariate t regression model , 1997 .

[38]  J. Nowicka-Zagrajek,et al.  Modeling electricity loads in California: ARMA models with hyperbolic noise , 2002, Signal Process..

[39]  John W. Tukey,et al.  Exploratory Data Analysis. , 1979 .

[40]  Nour Meddahi,et al.  Box–Cox transforms for realized volatility , 2011 .

[41]  R. Sakia The Box-Cox transformation technique: a review , 1992 .

[42]  Georg M. Goerg Lambert W random variables—a new family of generalized skewed distributions with applications to risk estimation , 2009, 0912.4554.

[43]  Thomas J. Sargent,et al.  Regression With Non-Gaussian Stable Disturbances: Some Sampling Results , 1971 .

[44]  G. Tsiotas On the use of the Box–Cox transformation on conditional variance models , 2007 .

[45]  John W. Tukey,et al.  Fitting Quantiles: Doubling, HR, HQ, and HHH Distributions , 2000 .

[46]  Chris Field Using the gh distribution to model extreme wind speeds , 2004 .

[47]  Mark E. J. Newman,et al.  Power-Law Distributions in Empirical Data , 2007, SIAM Rev..

[48]  Yi Lu,et al.  Forecasting realized volatility using a long-memory stochastic volatility model : estimation, prediction and seasonal adjustment , 2006 .

[49]  H. Thode Testing For Normality , 2002 .

[50]  Rosario N. Mantegna,et al.  Modeling of financial data: Comparison of the truncated Lévy flight and the ARCH(1) and GARCH(1,1) processes , 1998 .

[51]  S. Turnbull,et al.  Pricing foreign currency options with stochastic volatility , 1990 .

[52]  Robert M Corless,et al.  Some applications of the Lambert W  function to physics , 2000, Canadian Journal of Physics.

[53]  Jacek Ilow Forecasting network traffic using FARIMA models with heavy tailed innovations , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[54]  M. Rosenlicht,et al.  On the explicit solvability of certain transcendental equations , 1969 .

[55]  E. L. Lehmann,et al.  Theory of point estimation , 1950 .

[56]  Moshe Idan,et al.  Cauchy Estimation for Linear Scalar Systems , 2008, IEEE Transactions on Automatic Control.

[57]  R. Cont Empirical properties of asset returns: stylized facts and statistical issues , 2001 .

[58]  Natalia M. Markovich,et al.  Estimation of heavy-tailed probability density function with application to Web data , 2004, Comput. Stat..

[59]  Matthias Fischer,et al.  Generalized Tukey-type distributions with application to financial and teletraffic data , 2008 .

[60]  Todd C. Headrick,et al.  Parametric Probability Densities and Distribution Functions for Tukey g -and- h Transformations and their Use for Fitting Data , 2008 .

[61]  A. J. Lawrance,et al.  A note on the variance of the Box-Cox regression transformation estimate , 1987 .

[62]  Ananthram Swami,et al.  Non-Gaussian mixture models for detection and estimation in heavy-tailed noise , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[63]  R. Huisman,et al.  Tail-Index Estimates in Small Samples , 2001 .

[64]  Larry Wasserman,et al.  All of Nonparametric Statistics (Springer Texts in Statistics) , 2006 .