Outlier Detection in Logistic Regression

The use of logistic regression, its modelling and decision making from the estimated model and subsequent analysis has been drawn a great deal of attention since its inception. The current use of logistic regression methods includes epidemiology, biomedical research, criminology, ecology, engineering, pattern recognition, machine learning, wildlife biology, linguistics, business and finance, et cetera. Logistic regression diagnostics have attracted both theoreticians and practitioners in recent years. Detection and handling of outliers is considered as an important task in the data modelling domain, because the presence of outliers often misleads the modelling performances. Traditionally logistic regression models were used to fit data obtained under experimental conditions. But in recent years, it is an important issue to measure the outliers scale before putting the data as a logistic model input. It requires a higher mathematical level than most of the other material that steps backward to its study and application in spite of its inevitability. This chapter presents several diagnostic aspects and methods in logistic regression. Like linear regression, estimates of the logistic regression are sensitive to the unusual observations: outliers, high leverage, and influential observations. Numerical examples and analysis are presented to demonstrate the most recent outlier diagnostic methods using data sets from medical domain.

[1]  Abdul Nurunnabi,et al.  A Diagnostic Measure for Influential Observations in Linear Regression , 2011 .

[2]  M. Nasser,et al.  Outlier Detection in Linear Regression , 2011 .

[3]  Marina L. Gavrilova,et al.  Adaptive Algorithms for Intelligent Geometric A Computing , 2009, Encyclopedia of Artificial Intelligence.

[4]  A. Hadi,et al.  Identification of Multiple Outliers in Logistic Regression , 2008 .

[5]  Peter J. Rousseeuw,et al.  Robust Regression and Outlier Detection , 2005, Wiley Series in Probability and Statistics.

[6]  Roy E. Welsch,et al.  Efficient Computing of Regression Diagnostics , 1981 .

[7]  M. Nasser,et al.  Identification of multiple influential observations in logistic regression , 2010 .

[8]  D. Pregibon Logistic Regression Diagnostics , 1981 .

[9]  A. H. M. Rahmatullah Imon,et al.  Identifying multiple influential observations in linear regression , 2005 .

[10]  Peter J. Huber,et al.  Robust Statistics , 2005, Wiley Series in Probability and Statistics.

[11]  R. Cook Detection of influential observation in linear regression , 2000 .

[12]  Ali S. Hadi,et al.  Regression Analysis by Example: Chatterjee/Regression , 2006 .

[13]  Hans-Peter Kriegel,et al.  LOF: identifying density-based local outliers , 2000, SIGMOD 2000.

[14]  D. W. Scott,et al.  Variable Kernel Density Estimation , 1992 .

[15]  Clara Pizzuti,et al.  Distance-based detection and prediction of outliers , 2006, IEEE Transactions on Knowledge and Data Engineering.

[16]  Ali S. Hadi,et al.  A new measure of overall potential influence in linear regression , 1992 .

[17]  A. Hadi,et al.  BACON: blocked adaptive computationally efficient outlier nominators , 2000 .

[18]  L. Breiman,et al.  Variable Kernel Estimates of Multivariate Densities , 1977 .

[19]  S. Chatterjee,et al.  Influential Observations, High Leverage Points, and Outliers in Linear Regression , 1986 .

[20]  R. Welsch,et al.  The Hat Matrix in Regression and ANOVA , 1978 .

[21]  Frank L. Lewis,et al.  Computational intelligence in control , 2014, Annu. Rev. Control..

[22]  Raymond T. Ng,et al.  Distance-based outliers: algorithms and applications , 2000, The VLDB Journal.

[23]  R. R. Hocking,et al.  The regression dilemma , 1983 .

[24]  R. Cook Influential Observations in Linear Regression , 1979 .

[25]  R. Welsch INFLUENCE FUNCTIONS AND REGRESSION DIAGNOSTICS , 1982 .

[26]  J. Simonoff,et al.  Procedures for the Identification of Multiple Outliers in Linear Models , 1993 .

[27]  J. A. John,et al.  Influential Observations and Outliers in Regression , 1981 .

[28]  Fuchun Sun,et al.  An Improved Particle Swarm Optimization Algorithm Based on Quotient Space Theory , 2012, Int. J. Softw. Sci. Comput. Intell..