Minimum sample size determination of vibration signals in machine learning approach to fault diagnosis using power analysis

The machine learning approach to fault diagnosis consists of a chain of activities such as data acquisition, feature extraction, feature selection and classification. Each one is equally important in fault diagnosis. As machine learning is a soft science, there is a lot of scope for finding mathematical reasoning which otherwise researchers do it arbitrarily or heuristically. Minimum number of samples required to separate faulty conditions, with statistical stability is one such important factor. This paper provides a method for determination of minimum sample size using power analysis. A typical bearing fault diagnosis problem is taken as a case for illustration and the results are compared with that of entropy-based algorithm (J48) for determining minimum sample size. The results will serve as a guideline for researchers working in fault diagnosis area to choose appropriate sample size.

[1]  Helena Chmura Kraemer,et al.  How many subjects , 1989 .

[2]  S. Day,et al.  Internal pilot studies for estimating sample size. , 1994, Statistics in medicine.

[3]  J H Lubin,et al.  On power and sample size for studying features of the relative odds of disease. , 1990, American journal of epidemiology.

[4]  E. S. Pearson Biometrika tables for statisticians , 1967 .

[5]  D. Signorini,et al.  Sample size for Poisson regression , 1991 .

[6]  N. Buderer,et al.  Statistical methodology: I. Incorporating the prevalence of disease into the sample size calculation for sensitivity and specificity. , 1996, Academic emergency medicine : official journal of the Society for Academic Emergency Medicine.

[7]  S B Bull Sample size and power determination for a binary outcome and an ordinal exposure when logistic regression analysis is planned. , 1993, American journal of epidemiology.

[8]  Ronald W. Davis,et al.  Quantitative Monitoring of Gene Expression Patterns with a Complementary DNA Microarray , 1995, Science.

[9]  J. Whitehead Sample size calculations for ordered categorical data. , 1993, Statistics in medicine.

[10]  H. Toutenburg Fleiss, J. L.: Statistical Methods for Rates and Proportions. John Wiley & Sons, New York‐London‐Sydney‐Toronto 1973. XIII, 233 S. , 1974 .

[11]  N L Geller,et al.  Interim analyses in randomized clinical trials: ramifications and guidelines for practitioners. , 1987, Biometrics.

[12]  S R Lipsitz,et al.  Sample size for repeated measures studies with binary responses. , 1994, Statistics in medicine.

[13]  J M Nam,et al.  Establishing equivalence of two treatments and sample size requirements in matched-pairs design. , 1997, Biometrics.

[14]  R. Lewis,et al.  An introduction to the use of interim data analyses in clinical trials. , 1993, Annals of emergency medicine.

[15]  A Donner,et al.  A goodness-of-fit approach to inference procedures for the kappa statistic: confidence interval construction, significance-testing and sample size estimation. , 1992, Statistics in medicine.

[16]  J. Haseman,et al.  Exact Sample Sizes for Use with the Fisher-Irwin Test for 2 x 2 Tables , 1978 .

[17]  Chester L. Olson,et al.  Comparative Robustness of Six Tests in Multivariate Analysis of Variance , 1974 .

[18]  N A Obuchowski,et al.  Computing Sample Size for Receiver Operating Characteristic Studies , 1994, Investigative radiology.

[19]  W. Cumberland,et al.  Sample size requirement for repeated measurements in continuous data. , 1992, Statistics in medicine.

[20]  F. Hsieh,et al.  Sample size tables for logistic regression. , 1989, Statistics in medicine.

[21]  K. I. Ramachandran,et al.  Automatic rule learning using decision tree for fuzzy classifier in fault diagnosis of roller bearing , 2007 .

[22]  R T O'Neill Sample sizes for estimation of the odds ratio in unmatched case-control studies. , 1984, American journal of epidemiology.

[23]  David L. Streiner,et al.  Sample-Size Formulae for Parameter Estimation , 1994 .

[24]  A. Gould Planning and revising the sample size for a trial. , 1995, Statistics in Medicine.

[25]  J Nam,et al.  Sample size determination for case-control studies and the comparison of stratified and unstratified analyses. , 1992, Biometrics.

[26]  P Feigl,et al.  A graphical aid for determining sample size when comparing two independent proportions. , 1978, Biometrics.

[27]  Raghunathan Rengaswamy,et al.  A fast training neural network and its updation for incipient fault detection and diagnosis , 2000 .

[28]  S Lemeshow,et al.  Sample size requirements for studies estimating odds ratios or relative risks. , 1988, Statistics in medicine.

[29]  M. Conlon,et al.  Sample size determination based on Fisher's Exact Test for use in 2 x 2 comparative trials with low event rates. , 1992, Controlled clinical trials.

[30]  J. Fleiss Statistical methods for rates and proportions , 1974 .

[31]  V. Flack,et al.  Sample size determinations using logistic regression with pilot data. , 1993, Statistics in medicine.

[32]  P Roebruck,et al.  Comparison of tests and sample size formulae for proving therapeutic equivalence based on the difference of binomial probabilities. , 1995, Statistics in medicine.

[33]  V. Sugumaran,et al.  Fault diagnostics of roller bearing using kernel based neighborhood score multi-class support vector machine , 2008, Expert Syst. Appl..

[34]  S L Beal,et al.  Sample size determination for confidence intervals on the population mean and on the difference between two population means. , 1989, Biometrics.

[35]  C. McGreavy,et al.  Application of wavelets and neural networks to diagnostic system development , 1999 .

[36]  P. D. McFadden,et al.  Early Detection of Gear Failure by Vibration Analysis--I. Calculation of the Time Frequency Distribution , 1993 .

[37]  M. Pike,et al.  An improved approximate formula for calculating sample sizes for comparing two binomial distributions. , 1978, Biometrics.

[38]  K K Lan,et al.  A comparison of sample size methods for the logrank statistic. , 1992, Statistics in medicine.

[39]  S. N. Kavuri,et al.  Using fuzzy clustering with ellipsoidal units in neural networks for robust fault classification , 1993 .

[40]  N A Obuchowski,et al.  Sample size determination for diagnostic accuracy studies involving binormal ROC curve indices. , 1997, Statistics in medicine.

[41]  S. Day,et al.  Sample size estimation for comparing two or more treatment groups in clinical trials. , 1991, Statistics in medicine.

[42]  K. Pillai,et al.  On the Moments of the Trace of a Matrix and Approximations to its Distribution , 1959 .

[43]  P. O'Brien,et al.  A multiple testing procedure for clinical trials. , 1979, Biometrics.

[44]  W J Shih,et al.  Design for sample size re-estimation with interim data for double-blind clinical trials with binary outcomes. , 1997, Statistics in medicine.

[45]  G A Satten,et al.  Sample size requirements for interval estimation of the odds ratio. , 1990, American journal of epidemiology.

[46]  P. D. McFadden,et al.  Early detection of gear failure by vibration analysis--ii. interpretation of the time-frequency distribution using image processing techniques , 1993 .

[47]  K. I. Ramachandran,et al.  Feature selection using Decision Tree and classification through Proximal Support Vector Machine for fault diagnostics of roller bearing , 2007 .

[48]  Parker Ra,et al.  Sample size for individually matched case-control studies. , 1986, Biometrics.

[49]  Venkat Venkatasubramanian,et al.  Representing and diagnosing dynamic process data using neural networks , 1992 .

[50]  J. Whitehead,et al.  A FORTRAN program for the design and analysis of sequential clinical trials. , 1983, Computers and biomedical research, an international journal.

[51]  W D Dupont,et al.  Power calculations for matched case-control studies. , 1988, Biometrics.

[52]  R. H. Browne On the use of a pilot sample for sample size determination. , 1995, Statistics in medicine.

[53]  Edgar Erdfelder,et al.  G*Power 3: A flexible statistical power analysis program for the social, behavioral, and biomedical sciences , 2007, Behavior research methods.

[54]  Jacob Cohen Statistical Power Analysis for the Behavioral Sciences , 1969, The SAGE Encyclopedia of Research Design.

[55]  S. Pocock Group sequential methods in the design and analysis of clinical trials , 1977 .

[56]  J M Lachin,et al.  Power and sample size evaluation for the McNemar test with application to matched case-control studies. , 1992, Statistics in medicine.

[57]  D. Schoenfeld,et al.  Nomograms for calculating the number of patients needed for a clinical trial with survival as an endpoint. , 1982, Biometrics.

[58]  S Greenland,et al.  On sample-size and power calculations for studies using confidence intervals. , 1988, American journal of epidemiology.

[59]  P A Lachenbruch,et al.  On the sample size for studies based upon McNemar's test. , 1992, Statistics in medicine.

[60]  I. Gordon,et al.  The Myth of Continuity-Corrected Sample Size Formulae , 1996 .

[61]  M. L. Samuels,et al.  Sample Size Requirements for the Back-of-the-Envelope Binomial Confidence Interval , 1992 .

[62]  J. Hanley,et al.  The meaning and use of the area under a receiver operating characteristic (ROC) curve. , 1982, Radiology.

[63]  J D Lantos Sample size: profound implications of mundane calculations. , 1993, Pediatrics.

[64]  A. Mace Sample-Size Determination. , 1964 .

[65]  J A Bean,et al.  On the sample size for one-sided equivalence of sensitivities based upon McNemar's test. , 1995, Statistics in medicine.

[66]  W. Grove Statistical Methods for Rates and Proportions, 2nd ed , 1981 .

[67]  K Kim,et al.  Sample size determination for group sequential clinical trials with immediate response. , 1992, Statistics in medicine.

[68]  P Royston,et al.  Exact conditional and unconditional sample size for pair-matched studies with binary outcome: a practical guide. , 1993, Statistics in medicine.

[69]  Alice S. Whittemore,et al.  Sample Size for Logistic Regression with Small Response Probability , 1981 .

[70]  George Stephanopoulos,et al.  Determination of minimum sample size and discriminatory expression patterns in microarray data , 2002, Bioinform..