Sample size determination for logistic regression

The problem of sample size estimation is important in medical applications, especially in cases of expensive measurements of immune biomarkers. This paper describes the problem of logistic regression analysis with the sample size determination algorithms, namely the methods of univariate statistics, logistics regression, cross-validation and Bayesian inference. The authors, treating the regression model parameters as a multivariate variable, propose to estimate the sample size using the distance between parameter distribution functions on cross-validated data sets. Herewith, the authors give a new contribution to data mining and statistical learning, supported by applied mathematics.

[1]  D. Madigan,et al.  [Least Angle Regression]: Discussion , 2004 .

[2]  Eugene Demidenko,et al.  Sample size determination for logistic regression revisited , 2006, Statistics in medicine.

[3]  D. Hosmer,et al.  Applied Logistic Regression , 1991 .

[4]  J. Berger,et al.  Training samples in objective Bayesian model selection , 2004, math/0406460.

[5]  Fernando Pérez-Cruz,et al.  Kullback-Leibler divergence estimation of continuous distributions , 2008, 2008 IEEE International Symposium on Information Theory.

[6]  Bernard R. Rosner,et al.  Fundamentals of Biostatistics. , 1992 .

[7]  Christopher M. Bishop,et al.  Pattern Recognition and Machine Learning (Information Science and Statistics) , 2006 .

[8]  Klaus-Robert Müller,et al.  Asymptotic statistical theory of overtraining and cross-validation , 1997, IEEE Trans. Neural Networks.

[9]  B. Turlach Discussion of "Least Angle Regression" by Efron, Hastie, Johnstone and Tibshirani , 2004 .

[10]  Gerhard-Wilhelm Weber,et al.  A classification problem of credit risk rating investigated and solved by optimisation of the ROC curve , 2012, Central Eur. J. Oper. Res..

[11]  Nasser M. Nasrabadi,et al.  Pattern Recognition and Machine Learning , 2006, Technometrics.

[12]  J. Friedman Special Invited Paper-Additive logistic regression: A statistical view of boosting , 2000 .

[13]  Tom Fawcett,et al.  ROC Graphs: Notes and Practical Considerations for Researchers , 2007 .

[14]  David J. C. MacKay,et al.  Information Theory, Inference, and Learning Algorithms , 2004, IEEE Transactions on Information Theory.