Breakdown of statistical inference from some random experiments

Many experiments can be interpreted in terms of random processes operating according to some internal protocols. When experiments are costly or cannot be repeated only one or a few finite samples are available. In this paper we study data generated by pseudo-random computer experiments operating according to particular internal protocols. We show that the standard statistical analysis performed on a sample, containing 105 data points or more, may sometimes be highly misleading and statistical errors largely underestimated. Our results confirm in a dramatic way the dangers of standard asymptotic statistical inference if a sample is not homogeneous. We demonstrate that analyzing various subdivisions of samples by multiple chi-square tests and chi-square frequency graphs is very effective in detecting sample inhomogeneity. Therefore to assure correctness of the statistical inference the above mentioned chi-square tests and other non-parametric sample homogeneity tests should be incorporated in any statistical analysis of experimental data. If such tests are not performed the reported conclusions and estimates of the errors cannot be trusted. (C) 2015 Elsevier B.V. All rights reserved.

[1]  David J. Hand,et al.  Statistics and computing: the genesis of data science , 2015, Statistics and Computing.

[2]  Mitchell J. Mergenthaler Nonparametrics: Statistical Methods Based on Ranks , 1979 .

[3]  Andrei Khrennikov,et al.  On the equivalence of the Clauser–Horne and Eberhard inequality based tests , 2014, 1403.2811.

[4]  Marian Kupczynski,et al.  Is quantum theory predictably complete? , 2008, 0810.1259.

[5]  A. Zeilinger,et al.  Bell violation with entangled photons, free of the fair-sampling assumption , 2013, 2013 Conference on Lasers & Electro-Optics Europe & International Quantum Electronics Conference CLEO EUROPE/IQEC.

[6]  Ata Kabán,et al.  Non-parametric detection of meaningless distances in high dimensional data , 2011, Statistics and Computing.

[7]  William Kruskal,et al.  Miracles and Statistics: The Casual Assumption of Independence , 1988 .

[8]  Gregory W. Corder,et al.  Nonparametric Statistics : A Step-by-Step Approach , 2014 .

[9]  P. Young,et al.  Time series analysis, forecasting and control , 1972, IEEE Transactions on Automatic Control.

[10]  David R. Cox,et al.  Principles of Applied Statistics , 2011 .

[11]  Amir D. Aczel Complete Business Statistics , 1992 .

[12]  Andrei N. Kolmogorov,et al.  On Tables of Random Numbers (Reprinted from "Sankhya: The Indian Journal of Statistics", Series A, Vol. 25 Part 4, 1963) , 1998, Theor. Comput. Sci..

[13]  E. Lehmann Testing Statistical Hypotheses , 1960 .

[14]  J. Pfanzagl Parametric Statistical Theory , 1994 .

[15]  David R. Cox,et al.  PRINCIPLES OF STATISTICAL INFERENCE , 2017 .

[16]  Kristel Michielsen,et al.  Event-based simulation of quantum physics experiments , 2013, 1312.6942.

[17]  David A. Freedman,et al.  Statistical Models: Theory and Practice: References , 2005 .

[18]  A. Zeilinger,et al.  Bell violation using entangled photons without the fair-sampling assumption , 2012, Nature.

[19]  William H. Press,et al.  Numerical recipes , 1990 .

[20]  Aaron J. Miller,et al.  Detection-loophole-free test of quantum nonlocality, and applications. , 2013, Physical review letters.

[21]  J. G. Saw,et al.  Chebyshev Inequality With Estimated Mean and Variance , 1984 .

[22]  William Mendenhall,et al.  Introduction to Probability and Statistics , 1961, The Mathematical Gazette.

[23]  M. Kupczyński,et al.  Tests for the purity of the initial ensemble of states in scattering experiments , 1974 .

[24]  Gwilym M. Jenkins,et al.  Time series analysis, forecasting and control , 1971 .