CliDaPa: A new approach to combining clinical data with DNA microarrays

Traditionally, clinical data have been used as the only source of information to diagnose diseases. Nowadays, other types of information, such as various forms of omics data (e.g. DNA microarrays), are taken into account to improve diagnosis and even prognosis in many diseases. This paper proposes a new approach, called CliDaPa, for efficiently combining both sources of information, namely clinical data and gene expressions, in order to further improve estimations. In this approach, patients are firstly divided into different clusters (represented as a decision tree) depending on their clinical information. Thus, different groups of patients with similar behaviors are identified. Each individual group can be studied and classified separately, using only gene expression data, with different supervised classification methods, such as decision trees, Bayesian networks or lazy induction learning. To validate this method, two datasets based on Breast Cancer, a high social impact disease, have been used. For the proposed approach, internal (0.632 Bootstrap) and external validations have been carried out. Results have shown improvements in accuracy in the internal and external validation compared with the standard methods with clinical data and gene expression data separately. Thus, the CliDaPa algorithm fulfills our proposed objectives.

[1]  Robert J. Mayer,et al.  National Institutes of Health Consensus Development Conference Statement: adjuvant therapy for breast cancer, November 1-3, 2000. , 2001, Journal of the National Cancer Institute.

[2]  W. Huber,et al.  Analysis of microarray gene expression data , 2003 .

[3]  C. Cooper,et al.  Applications of microarray technology in breast cancer research , 2001, Breast Cancer Research.

[4]  R. Quatrano Genomics , 1998, Plant Cell.

[5]  Terry Speed,et al.  Normalization of cDNA microarray data. , 2003, Methods.

[6]  R. Gelber,et al.  Meeting highlights: international expert consensus on the primary therapy of early breast cancer 2005. , 2005, Annals of oncology : official journal of the European Society for Medical Oncology.

[7]  M. Fernö,et al.  "Good Old" clinical markers have similar power in breast cancer prognosis as microarray gene expression profilers. , 2004, European journal of cancer.

[8]  Bart De Moor,et al.  Predicting the prognosis of breast cancer by integrating clinical and microarray data with Bayesian networks , 2006, ISMB.

[9]  R. Tibshirani,et al.  Improvements on Cross-Validation: The 632+ Bootstrap Method , 1997 .

[10]  J. Warrington,et al.  The affymetrix GeneChip platform: an overview. , 2006, Methods in enzymology.

[11]  Cesare Furlanello,et al.  Integrating gene expression profiling and clinical data , 2008, Int. J. Approx. Reason..

[12]  John Quackenbush Section 7: Bioinformatics: Computational Approaches to Analysis of DNA Microarray Data , 2006, Yearbook of Medical Informatics.

[13]  D.M. Mount,et al.  An Efficient k-Means Clustering Algorithm: Analysis and Implementation , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[14]  Andrew Y. Ng,et al.  Preventing "Overfitting" of Cross-Validation Data , 1997, ICML.

[15]  R. Tibshirani,et al.  Gene expression patterns of breast carcinomas distinguish tumor subclasses with clinical implications , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[16]  Xuesong Lu,et al.  Predicting features of breast cancer with gene expression patterns , 2008, Breast Cancer Research and Treatment.

[17]  Lexin Li,et al.  Survival prediction of diffuse large-B-cell lymphoma based on both clinical and gene expression information , 2006, Bioinform..

[18]  Russ B. Altman,et al.  Missing value estimation methods for DNA microarrays , 2001, Bioinform..

[19]  R. Tibshirani,et al.  Semi-Supervised Methods to Predict Patient Survival from Gene Expression Data , 2004, PLoS biology.

[20]  Partha S. Vasisht Computational Analysis of Microarray Data , 2003 .

[21]  Yudong D. He,et al.  A Gene-Expression Signature as a Predictor of Survival in Breast Cancer , 2002 .

[22]  J. Mesirov,et al.  Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. , 1999, Science.

[23]  Yudong D. He,et al.  Gene expression profiling predicts clinical outcome of breast cancer , 2002, Nature.

[24]  E. Dougherty,et al.  Gene-expression profiles in hereditary breast cancer. , 2001, The New England journal of medicine.

[25]  Philip M. Long,et al.  Breast cancer classification and prognosis based on gene expression profiles from a population-based study , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[26]  Li Liu,et al.  Improved breast cancer prognosis through the combination of clinical and genetic markers , 2007, Bioinform..

[27]  José Antonio Gómez-Ruiz,et al.  Un Modelo para la Prediccion de Recidiva de Pacientes Operados de Cancer de Mama (CMO) Basado en Redes Neuronales , 2000, Inteligencia Artif..

[28]  John Quackenbush Microarray data normalization and transformation , 2002, Nature Genetics.

[29]  Van,et al.  A gene-expression signature as a predictor of survival in breast cancer. , 2002, The New England journal of medicine.

[30]  S. Dudoit,et al.  Normalization for cDNA microarray data: a robust composite method addressing single and multiple slide systematic variation. , 2002, Nucleic acids research.

[31]  H. Feilotter A Biologist’s Guide to Analysis of DNA Microarray Data. , 2002 .

[32]  J. Peterse,et al.  Breast cancer metastasis: markers and models , 2005, Nature Reviews Cancer.

[33]  J. Wetmur DNA probes: applications of the principles of nucleic acid hybridization. , 1991, Critical reviews in biochemistry and molecular biology.

[34]  W. Cleveland Robust Locally Weighted Regression and Smoothing Scatterplots , 1979 .

[35]  Thomas G. Dietterich What is machine learning? , 2020, Archives of Disease in Childhood.

[36]  C. Caldas,et al.  Molecular classification and molecular forecasting of breast cancer: ready for clinical application? , 2005, Journal of clinical oncology : official journal of the American Society of Clinical Oncology.

[37]  M. West,et al.  Gene expression predictors of breast cancer outcomes , 2003, The Lancet.

[38]  E. Winzeler,et al.  Genomics, gene expression and DNA arrays , 2000, Nature.

[39]  Concha Bielza,et al.  Machine Learning in Bioinformatics , 2008, Encyclopedia of Database Systems.

[40]  M Schena,et al.  Microarrays: biotechnology's discovery platform for functional genomics. , 1998, Trends in biotechnology.

[41]  Stephen T. C. Wong,et al.  Cancer classification and prediction using logistic regression with Bayesian gene selection , 2004, J. Biomed. Informatics.

[42]  R. Gelber,et al.  Meeting highlights: updated international expert consensus on the primary therapy of early breast cancer. , 2003, Journal of clinical oncology : official journal of the American Society of Clinical Oncology.