Testing Tail Weight of a Distribution Via Hazard Rate

Understanding the shape of a distribution of data is of interest to people in a great variety of fields, as it may affect the types of algorithms used for that data. Given samples from a distribution, we seek to understand how many elements appear infrequently, that is, to characterize the tail of the distribution. We develop an algorithm based on a careful bucketing scheme that distinguishes heavy-tailed distributions from non-heavy-tailed ones via a definition based on the hazard rate under some natural smoothness and ordering assumptions. We verify our theoretical results empirically.

[1]  Roman Vershynin,et al.  High-Dimensional Probability , 2018 .

[2]  ChunSu,et al.  Characterizations on Heavy—tailed Distributions by Means of Hazard Rate , 2003 .

[3]  Piotr Indyk,et al.  Learning-Based Frequency Estimation Algorithms , 2018, ICLR.

[4]  Constantinos Daskalakis,et al.  Optimal Testing for Properties of Distributions , 2015, NIPS.

[5]  L. Birge On the Risk of Histograms for Estimating Decreasing Densities , 1987 .

[6]  O. Papaspiliopoulos High-Dimensional Probability: An Introduction with Applications in Data Science , 2020 .

[7]  T. Mikosch Regular variation, subexponentiality and their applications in probability theory , 1999 .

[8]  Mor Harchol-Balter The Effect of Heavy-Tailed Job Size Distributions on Computer System Design , 1999 .

[9]  Ronitt Rubinfeld,et al.  Sublinear algorithms for testing monotone and unimodal distributions , 2004, STOC '04.

[10]  Stuart A. Klugman,et al.  Loss Models: From Data to Decisions , 1998 .

[11]  Ronitt Rubinfeld,et al.  Testing that distributions are close , 2000, Proceedings 41st Annual Symposium on Foundations of Computer Science.

[12]  A. W. Marshall,et al.  Properties of Probability Distributions with Monotone Hazard Rate , 1963 .

[13]  Ronitt Rubinfeld,et al.  Sampling Correctors , 2015, ITCS.

[14]  N. Heckman,et al.  Nonparametric testing for a monotone hazard function via normalized spacings , 2004 .

[15]  Ronitt Rubinfeld,et al.  Testing Shape Restrictions of Discrete Distributions , 2015, Theory of Computing Systems.

[16]  J. Arbel,et al.  On the sub-Gaussianity of the Beta and Dirichlet distributions , 2017, 1705.00048.

[17]  P. Hall,et al.  Testing for monotone increasing hazard rate , 2005, math/0507419.

[18]  M. Bryson Heavy-Tailed Distributions: Properties and Tests , 1974 .

[19]  Hongzi Mao,et al.  Learning scheduling algorithms for data processing clusters , 2018, SIGCOMM.

[20]  Vitaly Feldman,et al.  Does learning require memorization? a short tale about a long tail , 2019, STOC.

[21]  Martial Hebert,et al.  Learning to Model the Tail , 2017, NIPS.