Network or regression-based methods for disease discrimination: a comparison study

BackgroundIn stark contrast to network-centric view for complex disease, regression-based methods are preferred in disease prediction, especially for epidemiologists and clinical professionals. It remains a controversy whether the network-based methods have advantageous performance than regression-based methods, and to what extent do they outperform.MethodsSimulations under different scenarios (the input variables are independent or in network relationship) as well as an application were conducted to assess the prediction performance of four typical methods including Bayesian network, neural network, logistic regression and regression splines.ResultsThe simulation results reveal that Bayesian network showed a better performance when the variables were in a network relationship or in a chain structure. For the special wheel network structure, logistic regression had a considerable performance compared to others. Further application on GWAS of leprosy show Bayesian network still outperforms other methods.ConclusionAlthough regression-based methods are still popular and widely used, network-based approaches should be paid more attention, since they capture the complex relationship between variables.

[1]  M. Netea,et al.  Genomewide association study of leprosy. , 2010, The New England journal of medicine.

[2]  F. Harrell,et al.  Regression models for prognostic prediction: advantages, problems, and suggested solutions. , 1985, Cancer treatment reports.

[3]  P. Gregersen,et al.  Supervised machine learning and logistic regression identifies novel epistatic risk factors with PTPN22 for rheumatoid arthritis , 2010, Genes and Immunity.

[4]  J V Tu,et al.  Advantages and disadvantages of using artificial neural networks versus logistic regression for predicting medical outcomes. , 1996, Journal of clinical epidemiology.

[5]  Koen Vos,et al.  How to diagnose rheumatoid arthritis early: a prediction model for persistent (erosive) arthritis. , 2002, Arthritis and rheumatism.

[6]  Scott M. Brue,et al.  Data resource profile: the Rochester Epidemiology Project (REP) medical records-linkage system. , 2012, International journal of epidemiology.

[7]  R. Albert Network Inference, Analysis, and Modeling in Systems Biology , 2007, The Plant Cell Online.

[8]  H. Kitano Systems Biology: A Brief Overview , 2002, Science.

[9]  Anders L. Madsen,et al.  Hugin - The Tool for Bayesian Networks and Influence Diagrams , 2002, Probabilistic Graphical Models.

[10]  David Heckerman,et al.  Bayesian Networks for Data Mining , 2004, Data Mining and Knowledge Discovery.

[11]  L. Sunde,et al.  Existing data sources for clinical epidemiology: Danish registries for studies of medical genetic diseases , 2013, Clinical epidemiology.

[12]  Nevan J. Krogan,et al.  Quantitative Genetic Interactions Reveal Biological Modularity , 2010, Cell.

[13]  J. Sweeney Nursing Research—Principles and Methods , 1991 .

[14]  Laura Uusitalo,et al.  Advantages and challenges of Bayesian networks in environmental modelling , 2007 .

[15]  Finn V. Jensen,et al.  Bayesian Networks and Decision Graphs , 2001, Statistics for Engineering and Information Science.

[16]  R. Russell,et al.  Structural systems biology: modelling protein interactions , 2006, Nature Reviews Molecular Cell Biology.

[17]  A. Wetherby,et al.  Early diagnosis of autism spectrum disorder: stability and change in clinical diagnosis and symptom presentation. , 2013, Journal of child psychology and psychiatry, and allied disciplines.

[18]  R Scheines,et al.  The TETRAD Project: Constraint Based Aids to Causal Model Specification. , 1998, Multivariate behavioral research.

[19]  David W. Hosmer,et al.  Applied Logistic Regression , 1991 .

[20]  N. M. Faber,et al.  How to avoid over-fitting in multivariate calibration--the conventional validation approach and an alternative. , 2007, Analytica chimica acta.

[21]  R. Dybowski,et al.  Prediction of outcome in critically ill patients using artificial neural network synthesised by genetic algorithm , 1996, The Lancet.

[22]  Sun-Mi Lee,et al.  Logistic Regression and Bayesian Networks to Study Outcomes Using Large Data Sets , 2005, Nursing research.

[23]  Ashutosh Kumar Singh,et al.  The Elements of Statistical Learning: Data Mining, Inference, and Prediction , 2010 .

[24]  K. Moons,et al.  Diagnostic and prognostic prediction models , 2013, Journal of thrombosis and haemostasis : JTH.

[25]  Lucila Ohno-Machado,et al.  Logistic regression and artificial neural network classification models: a methodology review , 2002, J. Biomed. Informatics.

[26]  N. Obuchowski,et al.  Assessing the Performance of Prediction Models: A Framework for Traditional and Novel Measures , 2010, Epidemiology.

[27]  Henry Tirri,et al.  On Discriminative Bayesian Network Classifiers and Logistic Regression , 2005, Machine Learning.

[28]  M. Mayr From data gathering to systems medicine. , 2013, Cardiovascular research.

[29]  Andrew P. Bradley,et al.  The use of the area under the ROC curve in the evaluation of machine learning algorithms , 1997, Pattern Recognit..

[30]  C.J.H. Mann,et al.  Clinical Prediction Models: A Practical Approach to Development, Validation and Updating , 2009 .

[31]  H. Uno,et al.  Development and validation of diagnostic prediction model for solitary pulmonary nodules , 2007, Respirology.

[32]  M. Ringnér,et al.  Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks , 2001, Nature Medicine.

[33]  Denise Polit-O'Hara,et al.  Nursing Research: Principles and Methods , 1978 .