Model‐free data screening and cleaning

Categories of data screening and cleaning issues are reviewed and discussed. Bivariate model-free approaches, specialized model-free regression techniques, together with order statistic-based (OSB) approaches are illustrated and shown to be of particular value for a variety of data screening and cleaning applications. It is shown how a combination of curve estimation with OSB approaches can help screen a progression to AIDS data set. WIREs Comp Stat 2011 3 168–176 DOI: 10.1002/wics.140 For further resources related to this article, please visit the WIREs website.

[1]  A. Cohen,et al.  Estimating Parameters of Logarithmic-Normal Distributions by Maximum Likelihood , 1951 .

[2]  M. Tarter,et al.  A graphical procedure for distinguishing between two data analysis pitfalls , 1987 .

[3]  I. Good,et al.  Density Estimation and Bump-Hunting by the Penalized Likelihood Method Exemplified by Scattering and Meteorite Data , 1980 .

[4]  M. Tarter,et al.  A graphical analysis of the interrelationships among waterborne asbestos, digestive system cancer and population density. , 1983, Environmental health perspectives.

[5]  Isidore Eisenberger,et al.  Genesis of Bimodal Distributions , 1964 .

[6]  A. Feuerverger,et al.  The Empirical Characteristic Function and Its Applications , 1977 .

[7]  G. Wahba Data-Based Optimal Smoothing of Orthogonal Series Density Estimates , 1981 .

[8]  Кпсс,et al.  Первая конференция военных и боевых организаций РСДРП. Ноябрь 1906 год , 1932 .

[9]  Michael E. Tarter,et al.  A fortran implementation of univariate fourier series density estimation , 1986 .

[10]  A. Bowman An alternative method of cross-validation for the smoothing of density estimates , 1984 .

[11]  B. Silverman,et al.  Choosing the window width when estimating a density , 1978 .

[12]  W. J. Padgett,et al.  Nonparametric density estimation from censored data , 1984 .

[13]  M. Rosenblatt Remarks on Some Nonparametric Estimates of a Density Function , 1956 .

[14]  J. Henna On estimating of the number of constituents of a finite mixture of continuous distributions , 1985 .

[15]  P. Hall Orthogonal Series Methods for Both Qualitative and Quantitative Data , 1983 .

[16]  James Stephen Marron,et al.  Transformations in Density Estimation , 1991 .

[17]  G. Rutherford,et al.  Human immunodeficiency virus disease in California. Effects of the 1993 expanded case definition of the acquired immunodeficiency syndrome. , 1996, The Western journal of medicine.

[18]  E. J. Preston A graphical method for the analysis of statistical distributions into two normal components , 1953 .

[19]  A. Bowman A comparative study of some kernel-based nonparametric density estimators , 1985 .

[20]  D. Moore,et al.  Normalization of chromosome measurements: a new method. , 1975, Computers in biology and medicine.

[21]  M. Tarter,et al.  A New Test For and Class of Transformations To Normality , 1972 .

[22]  Fahim Ashkar,et al.  Fitting the log-logistic distribution by generalized moments , 2006 .

[23]  V. Hasselblad Estimation of parameters for a mixture of normal distributions , 1966 .

[24]  M. Tarter,et al.  Properties of the Median and Other Order Statistics of Logistic Variates , 1965 .

[25]  M. Tarter,et al.  On thresholds and environmental curve tensiometers , 1997, Environmental and Ecological Statistics.

[26]  P. Fisk Estimation of Location and Scale Parameters in a Truncated Grouped Sech Square Distribution , 1961 .

[27]  M E Tarter Biocomputational methodology an adjunct to theory and applications. , 1979, Biometrics.

[28]  R. P. Boas Representation of Probability Distributions by Charlier Series , 1949 .

[29]  J T Wong,et al.  Interactive editing of biomedical data. , 1976, Computer programs in biomedicine.

[30]  M E Tarter,et al.  Interactive graphical isolation of homogeneous data subgroups. , 1978, Computer programs in biomedicine.

[31]  E. Parzen On Estimation of a Probability Density Function and Mode , 1962 .

[32]  M. Tarter,et al.  On graphing estimated distributions using modified scatter diagrams , 1988 .

[33]  W. Fellner Heuristic estimation of probability densities , 1974 .

[34]  N. Mann Point and Interval Estimation Procedures for the Two-Parameter Weibull and Extreme-Value Distributions , 1968 .

[35]  L. Breiman,et al.  Variable Kernel Estimates of Multivariate Densities , 1977 .

[36]  R. Kronmal,et al.  On Multivariate Density Estimates Based on Orthogonal Expansions , 1970 .

[37]  Stuart C. Schwartz,et al.  A series technique for the optimum detection of stochastic signals in noise , 1969, IEEE Trans. Inf. Theory.

[38]  H. N. Nagaraja,et al.  Order Statistics, Third Edition , 2005, Wiley Series in Probability and Statistics.

[39]  B. R. Crain A Note on Density Estimation Using Orthogonal Expansions , 1973 .

[40]  Geoffrey S. Watson,et al.  Density Estimation by Orthogonal Series , 1969 .

[41]  A. Izenman Review Papers: Recent Developments in Nonparametric Density Estimation , 1991 .

[42]  M. Tiku ESTIMATING THE PARAMETERS OF NORMAL AND LOGISTIC DISTRIBUTIONS FROM CENSORED SAMPLES1 , 1968 .

[43]  H. D. Brunk Univariate density estimation by orthogonal series , 1978 .

[44]  Robert P. W. Duin,et al.  On the Choice of Smoothing Parameters for Parzen Estimators of Probability Density Functions , 1976, IEEE Transactions on Computers.

[45]  S. Bennett,et al.  Log‐Logistic Regression Models for Survival Data , 1983 .

[46]  Peter J. Diggle,et al.  The Selection of Terms in an Orthogonal Series Density Estimator , 1986 .

[47]  R B Geskus,et al.  Methods for estimating the AIDS incubation time distribution when date of seroconversion is censored , 2001, Statistics in medicine.

[48]  B. Silverman,et al.  On the Estimation of a Probability Density Function by the Maximum Penalized Likelihood Method , 1982 .

[49]  M. Tarter,et al.  Co-ordinate transformations to normality and the power of normal tests for independence , 1969 .

[50]  Rui J. P. de Figueiredo,et al.  An Adaptive Orthogonal-Series Estimator for Probability Density Functions , 1978 .

[51]  S. Schwartz Estimation of Probability Density by an Orthogonal Series , 1967 .

[52]  B. B. Winter Convergence rate of perturbed empirical distribution functions , 1979 .

[53]  R. Kronmal,et al.  An Introduction to the Implementation and Theory of Nonparametric Density Estimation , 1976 .

[54]  J. Hart On the choice of a truncation point in fourier series density estimation , 1985 .

[55]  C. Blaydon,et al.  Approximation of distribution and density functions , 1967 .

[56]  E. Wegman Maximum likelihood estimation of a unimodal density. II , 1970 .

[57]  C. Daniel Use of Half-Normal Plots in Interpreting Factorial Two-Level Experiments , 1959 .