Computational intelligence for microarray data and biomedical image analysis for the early diagnosis of breast cancer

The objective of this paper was to perform a comparative analysis of the computational intelligence algorithms to identify breast cancer in its early stages. Two types of data representations were considered: microarray based and medical imaging based. In contrast to previous researches, this research also considered the imbalanced nature of these data. It was observed that the SMO algorithm performed better for the majority of the test data, especially for microarray based data when accuracy was used as performance measure. Considering the imbalanced characteristic of the data, the Naive Bayes algorithm was seen to perform highly in terms of true positive rate (TPR). Regarding the influence of SMOTE, a well-known imbalanced data classification technique, it was observed that there was a notable performance improvement for J48, while the performance of SMO remained comparable for the majority of the datasets. Overall, the results indicated SMO as the most potential candidate for the microarray and image dataset considered in this research.

[1]  Andrew P. Bradley,et al.  The use of the area under the ROC curve in the evaluation of machine learning algorithms , 1997, Pattern Recognit..

[2]  H. Levine Medical Imaging , 2010, Annals of Biomedical Engineering.

[3]  T. Poggio,et al.  Prediction of central nervous system embryonal tumour outcome based on gene expression , 2002, Nature.

[4]  Stephen Chia,et al.  Micrometastatic Node-Positive Breast Cancer: Long-Term Outcomes and Identification of High-Risk Subsets in a Large Population-Based Series , 2010, Annals of Surgical Oncology.

[5]  Sayan Mukherjee,et al.  Support Vector Method for Multivariate Density Estimation , 1999, NIPS.

[6]  M Congedo,et al.  A review of classification algorithms for EEG-based brain–computer interfaces , 2007, Journal of neural engineering.

[7]  Michael L. Gatza,et al.  A pathway-based classification of human breast cancer , 2010, Proceedings of the National Academy of Sciences.

[8]  Frank Kong,et al.  Isolated Colonic Metastasis From Primary Invasive Ductal Breast Carcinoma: Role of Tumor Marker in Early Diagnosis , 2012 .

[9]  Austin H. Chen,et al.  The improvement of breast cancer prognosis accuracy from integrated gene expression and clinical data , 2012, Expert Syst. Appl..

[10]  A. B. M. Shawkat Ali,et al.  Data Mining. Methods And Techniques , 2007 .

[11]  Emilio Bombardieri,et al.  Breast cancer: Nuclear medicine in diagnosis and therapeutic options , 2008 .

[12]  C. Epstein,et al.  The Oxford handbook of transcranial stimulation , 2012 .

[13]  Bruce Stillman,et al.  Molecular Approaches to Controlling Cancer , 2005 .

[14]  Rangaraj M. Rangayyan,et al.  A review of computer-aided diagnosis of breast cancer: Toward the detection of subtle signs , 2007, J. Frankl. Inst..

[15]  Helge J. Ritter,et al.  BCI competition 2003-data set IIb: support vector machines for the P300 speller paradigm , 2004, IEEE Transactions on Biomedical Engineering.

[16]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[17]  E. Simpson,et al.  Obesity and breast cancer: progress to understanding the relationship. , 2010, Cancer research.

[18]  Werner Dubitzky,et al.  Multiclass Cancer Classification Using Gene Expression Profiling and Probabilistic Neural Networks , 2002, Pacific Symposium on Biocomputing.

[19]  Paolo Vineis,et al.  Dietary fat and breast cancer risk in the European Prospective Investigation into Cancer and Nutrition. , 2008, The American journal of clinical nutrition.

[20]  Osmar R. Zaïane,et al.  Application of Data Mining Techniques for Medical Image Classification , 2001, MDM/KDD.

[21]  Paolo Boffetta,et al.  Breastfeeding and breast cancer risk in India: A multicenter case‐control study , 2009, International journal of cancer.

[22]  A. Carmichael,et al.  Obesity in post menopausal women with a family history of breast cancer: prevalence and risk awareness , 2009, International seminars in surgical oncology : ISSO.

[23]  U. Senarath,et al.  Prolonged breastfeeding reduces risk of breast cancer in Sri Lankan women: a case-control study. , 2010, Cancer epidemiology.

[24]  Jasjit S. Suri,et al.  Non-Extensive Entropy for CAD Systems of Breast Cancer Images , 2006, 2006 19th Brazilian Symposium on Computer Graphics and Image Processing.

[25]  Lakhmi C. Jain,et al.  Knowledge-Based Intelligent Information and Engineering Systems , 2004, Lecture Notes in Computer Science.

[26]  Hau-San Wong,et al.  Constructing the gene regulation-level representation of microarray data for cancer classification , 2008, J. Biomed. Informatics.

[27]  Zhang Qizhong Gene Selection and Classification Using Non-linear Kernel Support Vector Machines Based on Gene Expression Data , 2007, 2007 IEEE/ICME International Conference on Complex Medical Engineering.

[28]  Nitesh V. Chawla,et al.  SMOTE: Synthetic Minority Over-sampling Technique , 2002, J. Artif. Intell. Res..

[29]  Jesmin Nahar,et al.  Microarray data classification using automatic SVM kernel selection. , 2007, DNA and cell biology.

[30]  A. B. M. Shawkat Ali,et al.  Early Breast Cancer Identification: Which Way to Go? Microarray or Image Based Computer Aided Diagnosis! , 2009, 2009 Third International Conference on Network and System Security.

[31]  Anke Meyer-Bäse,et al.  Model-free visualization of suspicious lesions in breast MRI based on supervised and unsupervised learning , 2008, Eng. Appl. Artif. Intell..

[32]  A. Nobel,et al.  Concordance among Gene-Expression – Based Predictors for Breast Cancer , 2011 .

[33]  Rangaraj M. Rangayyan,et al.  Pattern classification of breast masses via fractal analysis of their contours , 2005 .

[34]  Ji-Xiang Du,et al.  Microarray data classification based on ensemble independent component selection , 2009, Comput. Biol. Medicine.

[35]  Kuldip K. Paliwal,et al.  Cancer classification by gradient LDA technique using microarray gene expression data , 2008, Data Knowl. Eng..

[36]  A. Kandaswamy,et al.  Experimental investigation on breast tissue classification based on statistical feature extraction of mammograms , 2007, Comput. Medical Imaging Graph..

[37]  Lei Ding,et al.  Motor imagery classification by means of source analysis for brain–computer interface applications , 2004, Journal of neural engineering.

[38]  Yurii B. Shvetsov,et al.  Plasma sex hormone concentrations and breast cancer risk in an ethnically diverse population of postmenopausal women: the Multiethnic Cohort Study. , 2010, Endocrine-related cancer.

[39]  Nong Ye,et al.  The Handbook of Data Mining , 2003 .

[40]  Beverly A. Teicher,et al.  Cancer Drug Resistance , 2006 .

[41]  Anton Berns,et al.  Cancer: Gene expression in diagnosis , 2000, Nature.

[42]  Lior Shamir,et al.  Pattern recognition for biomedical imaging and image-guided diagnosis , 2009, 2009 IEEE/NIH Life Science Systems and Applications Workshop.

[43]  Daniel B. Kopans,et al.  Estimates of the Sizes at Which Breast Cancers Become Detectable on Mammographic and Clinical Grounds , 2003 .

[44]  Sunil R. Lakhani,et al.  Molecular classification of breast carcinoma , 2012 .

[45]  Aytürk Keles,et al.  Expert system based on neuro-fuzzy rules for diagnosis breast cancer , 2011, Expert Syst. Appl..

[46]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques, 3rd Edition , 1999 .

[47]  Martin O. Leach,et al.  The UK MARIBS Breast Screening Study: Evaluation of radiological features for breast tumour classification in clinical screening with machine learning methods , 2005, Artif. Intell. Medicine.

[48]  M. Heilemann,et al.  Instrumental relating and treatment decision making among older women with early-stage breast cancer. , 2012, Oncology nursing forum.

[49]  Leif E. Peterson,et al.  Machine learning-based receiver operating characteristic (ROC) curves for crisp and fuzzy classification of DNA microarrays in cancer research , 2008, Int. J. Approx. Reason..

[50]  Jiawei Han,et al.  Cancer classification using gene expression data , 2003, Inf. Syst..

[51]  Yoonkyung Lee,et al.  Classification of Multiple Cancer Types by Multicategory Support Vector Machines Using Gene Expression Data , 2003, Bioinform..

[52]  Anant Madabhushi,et al.  Automated grading of breast cancer histopathology using spectral clustering with textural and architectural image features , 2008, 2008 5th IEEE International Symposium on Biomedical Imaging: From Nano to Macro.

[53]  Ian Witten,et al.  Data Mining , 2000 .

[54]  Bjørn Naume,et al.  A Prognostic Gene Expression Profile That Predicts Circulating Tumor Cell Presence in Breast Cancer Patients , 2012, PloS one.

[55]  R. Ross,et al.  Risk factors for breast cancer in chinese women of Beijing , 1988, International journal of cancer.

[56]  Mingquan Zhou,et al.  Application of fuzzy cluster analysis for medical image data mining , 2005, IEEE International Conference Mechatronics and Automation, 2005.

[57]  Peng Zhou,et al.  A sequential feature extraction approach for naïve bayes classification of microarray data , 2009, Expert Syst. Appl..

[58]  E. Ward,et al.  Integrating Tools for Breast Cancer Risk Assessment, Risk Reduction, and Early Detection , 2010, Cancer Epidemiology, Biomarkers & Prevention.

[59]  Mitul Shah,et al.  Familial relative risks for breast cancer by pathological subtype: a population-based cohort study , 2010, Breast Cancer Research.

[60]  Gert Pfurtscheller,et al.  Characterization of four-class motor imagery EEG data for the BCI-competition 2005 , 2005, Journal of neural engineering.

[61]  Alphonse G. Taghian Comprar Breast Cancer - A Multidisciplinary Approach to Diagnosis and Management | Alphonse G. Taghian | 9781933864440 | Demos Medical Publishing , 2009 .

[62]  Ta-Cheng Chen,et al.  A GAs based approach for mining breast cancer pattern , 2006, Expert Syst. Appl..

[63]  John Quackenbush,et al.  A three-gene model to robustly identify breast cancer molecular subtypes. , 2012, Journal of the National Cancer Institute.

[64]  D. McCready,et al.  A new gene expression signature, the ClinicoMolecular Triad Classification, may improve prediction and prognostication of breast cancer at the time of diagnosis , 2011, Breast Cancer Research.

[65]  Yu-Min Chiang,et al.  The application of ant colony optimization for gene selection in microarray-based cancer classification , 2008, 2008 International Conference on Machine Learning and Cybernetics.

[66]  W. P. Evans,et al.  Breast cancer screening: Successes and challenges , 2012, CA: a cancer journal for clinicians.

[67]  Andy J. Minn,et al.  Genes that mediate breast cancer metastasis to lung , 2005, Nature.

[68]  Hiok Chai Quek,et al.  A novel cognitive interpretation of breast cancer thermography with complementary learning fuzzy neural memory structure , 2007, Expert Systems with Applications.

[69]  Yudong D. He,et al.  Gene expression profiling predicts clinical outcome of breast cancer , 2002, Nature.