Logistic Regression and Bayesian Networks to Study Outcomes Using Large Data Sets

BackgroundIn nursing research, the interest in using large health care databases to predict nursing sensitive outcomes is growing rapidly. Traditionally, one of the most frequently used methods is logistic regression (LR), which, although powerful and familiar, has several limitations when used in the analysis of large databases. As a result, innovative approaches are required. ApproachTo (a) introduce an innovative/alternative data analysis approach (Bayesian network), (b) discuss the constraints of LR and the complementary advantages of Bayesian networks (BNs) in working with large and multidimensional health care data, and (c) provide a fundamental understanding of the use of BNs in the nursing/health care domain. ResultsStudies have shown that BNs have several advantages over LR in analyzing complex and large data: (a) statistical assumptions, such as linearity and additivity, are relaxed; (b) handling of a larger number of predictors and identification of interactions among predictors is less complex; and (c) the discovery of structure, pattern, and knowledge, for example, of unknown, complex, and nonlinear relationships, in data is facilitated. ConclusionOutcome studies, such as those undertaken by nurse researchers, may benefit from the examination and use of innovative approaches such as BNs to the analysis of very large and complex health care data sets.

[1]  David Heckerman,et al.  Bayesian Networks for Knowledge Discovery , 1996, Advances in Knowledge Discovery and Data Mining.

[2]  G C Sakellaropoulos,et al.  Development of a Bayesian Network for the Prognosis of Head Injuries using Graphical Model Selection Techniques , 1999, Methods of Information in Medicine.

[3]  D. Hosmer,et al.  A review of goodness of fit statistics for use in the development of logistic regression models. , 1982, American journal of epidemiology.

[4]  S. Lauritzen The EM algorithm for graphical association models with missing data , 1995 .

[5]  David Heckerman,et al.  Bayesian Networks for Data Mining , 2004, Data Mining and Knowledge Discovery.

[6]  Peter J. Haug,et al.  Automatic identification of patients eligible for a pneumonia guideline , 2000, AMIA.

[7]  Richard E. Neapolitan,et al.  Learning Bayesian networks , 2007, KDD '07.

[8]  David J. Spiegelhalter,et al.  Probabilistic Networks and Expert Systems , 1999, Information Science and Statistics.

[9]  F. Harrell,et al.  Regression models for prognostic prediction: advantages, problems, and suggested solutions. , 1985, Cancer treatment reports.

[10]  G A Diamond,et al.  Future imperfect: the limitations of clinical prediction models and the limits of clinical prediction. , 1989, Journal of the American College of Cardiology.

[11]  S. Glantz Primer of applied regression and analysis of variance / Stanton A. Glantz, Bryan K. Slinker , 1990 .

[12]  S. Menard Applied Logistic Regression Analysis , 1996 .

[13]  J. Hanley,et al.  The meaning and use of the area under a receiver operating characteristic (ROC) curve. , 1982, Radiology.

[14]  Todd R. Johnson,et al.  Predicting the Risk of Obesity Using a Bayesian Network , 1999, AMIA.

[15]  E W Steyerberg,et al.  Stepwise selection in small data sets: a simulation study of bias in logistic regression analysis. , 1999, Journal of clinical epidemiology.

[16]  R. Gonzalez Applied Multivariate Statistics for the Social Sciences , 2003 .

[17]  R Jirousek,et al.  Constructing probabilistic models. , 1997, International journal of medical informatics.

[18]  J. Hanley,et al.  A method of comparing the areas under receiver operating characteristic curves derived from the same cases. , 1983, Radiology.

[19]  J. Stevens,et al.  Applied multivariate statistics for the social sciences, 4th ed. , 2002 .

[20]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[21]  D. Hosmer,et al.  Goodness of fit tests for the multiple logistic regression model , 1980 .

[22]  Anders L. Madsen,et al.  Hugin - The Tool for Bayesian Networks and Influence Diagrams , 2002, Probabilistic Graphical Models.

[23]  David J. Spiegelhalter,et al.  Local computations with probabilities on graphical structures and their application to expert systems , 1990 .

[24]  Sumitra Mukherjee,et al.  A Bayesian Framework to Determine Patient Compliance in Glaucoma Cases , 2000, AMIA.

[25]  Ross D. Shachter,et al.  A Bayesian network for mammography , 2000, AMIA.

[26]  B. Tabachnick,et al.  Using multivariate statistics, 5th ed. , 2007 .

[27]  S. P. Luttrell,et al.  An adaptive Bayesian network for low-level image processing , 1993 .

[28]  Sun-Mi Lee,et al.  Bayesian networks for knowledge discovery in large datasets: basics for nurse researchers , 2003, J. Biomed. Informatics.

[29]  George Hripcsak,et al.  Classification algorithms applied to narrative reports , 1999, AMIA.