Multiclass classification for skin cancer profiling based on the integration of heterogeneous gene expression series

Most of the research studies developed applying microarray technology to the characterization of different pathological states of any disease may fail in reaching statistically significant results. This is largely due to the small repertoire of analysed samples, and to the limitation in the number of states or pathologies usually addressed. Moreover, the influence of potential deviations on the gene expression quantification is usually disregarded. In spite of the continuous changes in omic sciences, reflected for instance in the emergence of new Next-Generation Sequencing-related technologies, the existing availability of a vast amount of gene expression microarray datasets should be properly exploited. Therefore, this work proposes a novel methodological approach involving the integration of several heterogeneous skin cancer series, and a later multiclass classifier design. This approach is thus a way to provide the clinicians with an intelligent diagnosis support tool based on the use of a robust set of selected biomarkers, which simultaneously distinguishes among different cancer-related skin states. To achieve this, a multi-platform combination of microarray datasets from Affymetrix and Illumina manufacturers was carried out. This integration is expected to strengthen the statistical robustness of the study as well as the finding of highly-reliable skin cancer biomarkers. Specifically, the designed operation pipeline has allowed the identification of a small subset of 17 differentially expressed genes (DEGs) from which to distinguish among 7 involved skin states. These genes were obtained from the assessment of a number of potential batch effects on the gene expression data. The biological interpretation of these genes was inspected in the specific literature to understand their underlying information in relation to skin cancer. Finally, in order to assess their possible effectiveness in cancer diagnosis, a cross-validation Support Vector Machines (SVM)-based classification including feature ranking was performed. The accuracy attained exceeded the 92% in overall recognition of the 7 different cancer-related skin states. The proposed integration scheme is expected to allow the co-integration with other state-of-the-art technologies such as RNA-seq.

[1]  Alex E. Lash,et al.  Gene Expression Omnibus: NCBI gene expression and hybridization array data repository , 2002, Nucleic Acids Res..

[2]  Rafael A. Irizarry,et al.  A framework for oligonucleotide microarray preprocessing , 2010, Bioinform..

[3]  J. Leonardi-Bee,et al.  A systematic review of worldwide incidence of nonmelanoma skin cancer , 2012, The British journal of dermatology.

[4]  J. Malvehy,et al.  Precancerous Skin Lesions. , 2017 .

[5]  Hinrich W. H. Göhlmann,et al.  Gene Expression Studies Using Affymetrix Microarrays , 2009, Chapman and Hall / CRC mathematical and computational biology series.

[6]  Chris H. Q. Ding,et al.  Minimum redundancy feature selection from microarray gene expression data , 2003, Computational Systems Bioinformatics. CSB2003. Proceedings of the 2003 IEEE Bioinformatics Conference. CSB2003.

[7]  A. Jemal,et al.  Cancer statistics, 2018 , 2018, CA: a cancer journal for clinicians.

[8]  Crispin J. Miller,et al.  The removal of multiplicative, systematic bias allows integration of breast cancer gene expression datasets – improving meta-analysis and prediction of prognosis , 2008, BMC Medical Genomics.

[9]  A. Kraskov,et al.  Estimating mutual information. , 2003, Physical review. E, Statistical, nonlinear, and soft matter physics.

[10]  Andreas Heider,et al.  virtualArray: a R/bioconductor package to merge raw data from different microarray platforms , 2013, BMC Bioinformatics.

[11]  A. Katalinic,et al.  Epidemiology of cutaneous melanoma and non‐melanoma skin cancer in Schleswig‐Holstein, Germany: incidence, clinical subtypes, tumour stages and localization (epidemiology of skin cancer) , 2003, The British journal of dermatology.

[12]  Joanna Jaworek-Korjakowska,et al.  Determination of border irregularity in dermoscopic color images of pigmented skin lesions , 2014, EMBC.

[13]  Janos X. Binder,et al.  DISEASES: Text mining and data integration of disease–gene associations , 2014, bioRxiv.

[14]  Roxana Savastru,et al.  Optical techniques for the noninvasive diagnosis of skin cancer , 2013, Journal of Cancer Research and Clinical Oncology.

[15]  Chunyu Liu,et al.  Removing Batch Effects in Analysis of Expression Microarray Data: An Evaluation of Six Batch Adjustment Methods , 2011, PloS one.

[16]  R. Hoffmann A wiki for the life sciences where authorship matters , 2008, Nature Genetics.

[17]  Geoffrey J. McLachlan,et al.  Statistical Analysis on Microarray Data: Selection of Gene Prognosis Signatures , 2009 .

[18]  Ellen S. Marmur,et al.  The Kinetics of Skin Cancer: Progression of Actinic Keratosis to Squamous Cell Carcinoma , 2007, Dermatologic surgery : official publication for American Society for Dermatologic Surgery [et al.].

[19]  Zhen Ji,et al.  Iterative ensemble feature selection for multiclass classification of imbalanced microarray data , 2016, Journal of Biological Research-Thessaloniki.

[20]  Gautier Koscielny,et al.  Open Targets: a platform for therapeutic target identification and validation , 2016, Nucleic Acids Res..

[21]  Leopold Parts,et al.  Gene expression changes with age in skin, adipose tissue, blood and brain , 2013, Genome Biology.

[22]  A. Hammerle-Fickinger,et al.  mRNA and microRNA quality control for RT-qPCR analysis. , 2010, Methods.

[23]  R. S. Shiyam Sundar,et al.  Performance analysis of melanoma early detection using skin lession classification system , 2016, 2016 International Conference on Circuit, Power and Computing Technologies (ICCPCT).

[24]  J. Shao Linear Model Selection by Cross-validation , 1993 .

[25]  A. Savitzky,et al.  Smoothing and Differentiation of Data by Simplified Least Squares Procedures. , 1964 .

[26]  Ignacio Rojas,et al.  Leukemia multiclass assessment and classification from Microarray and RNA-seq technologies integration at gene expression level , 2019, PloS one.

[27]  Matthew E. Ritchie,et al.  limma powers differential expression analyses for RNA-sequencing and microarray studies , 2015, Nucleic acids research.

[28]  Núria Queralt-Rosinach,et al.  DisGeNET: a discovery platform for the dynamical exploration of human diseases and their genes , 2015, Database J. Biol. Databases Curation.

[29]  Thilo Gambichler,et al.  Microarray analysis of microRNA expression in cutaneous squamous cell carcinoma. , 2012, Journal of dermatological science.

[30]  Angel Cruz-Roa,et al.  Identifying histological concepts on basal cell carcinoma images using nuclei based sampling and multi-scale descriptors , 2015, 2015 IEEE 12th International Symposium on Biomedical Imaging (ISBI).

[31]  A. Kopf,et al.  ABCDE--an evolving concept in the early detection of melanoma. , 2005, Archives of dermatology.

[32]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[33]  Graham G Giles,et al.  Non‐melanoma skin cancer in Australia: the 2002 national survey and trends since 1985 , 2006, The Medical journal of Australia.

[34]  María Pérez-Ortiz,et al.  Tackling the ordinal and imbalance nature of a melanoma image classification problem , 2016, 2016 International Joint Conference on Neural Networks (IJCNN).

[35]  Ignacio Rojas,et al.  Integration of RNA-Seq data with heterogeneous microarray data for breast cancer profiling , 2017, BMC Bioinformatics.

[36]  Sebastian Thrun,et al.  Skin Cancer Detection and Tracking using Data Synthesis and Deep Learning , 2016, AAAI Workshops.

[37]  B. Ljung,et al.  The gene expression signatures of melanoma progression , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[38]  Kurt Hornik,et al.  The Comprehensive R Archive Network , 2012 .

[39]  Gaurav Sharma,et al.  MATLAB®: A Language for Parallel Computing , 2009, International Journal of Parallel Programming.

[40]  David M. Simcha,et al.  Tackling the widespread and critical impact of batch effects in high-throughput data , 2010, Nature Reviews Genetics.

[41]  Dennis B. Troup,et al.  NCBI GEO: mining tens of millions of expression profiles—database and tools update , 2006, Nucleic Acids Res..

[42]  Ralph Andre,et al.  Quantitative polymerase chain reaction. , 2014, British journal of hospital medicine.

[43]  Benjamin M. Bolstad,et al.  affy - analysis of Affymetrix GeneChip data at the probe level , 2004, Bioinform..

[44]  A. Qureshi,et al.  Personal history of psoriasis and risk of nonmelanoma skin cancer (NMSC) among women in the United States: A population-based cohort study. , 2016, Journal of the American Academy of Dermatology.

[45]  Chris C. P. Snijders,et al.  Development of a Non-Melanoma Skin Cancer Detection Model , 2015, Dermatology.

[46]  Limsoon Wong,et al.  Why Batch Effects Matter in Omics Data, and How to Avoid Them. , 2017, Trends in biotechnology.

[47]  Susmita Ghosh,et al.  Texture and color feature based WLS framework aided skin cancer classification using MSVM and ELM , 2015, 2015 Annual IEEE India Conference (INDICON).

[48]  Trudie Strauss,et al.  Generalising Ward’s Method for Use with Manhattan Distances , 2017, PloS one.

[49]  C. Stathopoulos,et al.  Translation regulation in skin cancer from a tRNA point of view. , 2019, Epigenomics.

[50]  I. García-Doval,et al.  Skin Cancer Incidence and Mortality in Spain: A Systematic Review and Meta-Analysis , 2016 .

[51]  James T. Elder,et al.  Distinct gene expression profiles of viral- and non-viral associated Merkel cell carcinoma revealed by transcriptome analysis , 2012, The Journal of investigative dermatology.

[52]  Mathukumalli Vidyasagar,et al.  Exploiting Ordinal Class Structure in Multiclass Classification: Application to Ovarian Cancer , 2015, IEEE Life Sciences Letters.

[53]  Cheng Li,et al.  Adjusting batch effects in microarray expression data using empirical Bayes methods. , 2007, Biostatistics.

[54]  T. Poggio,et al.  Multiclass cancer diagnosis using tumor gene expression signatures , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[55]  M. Gerstein,et al.  RNA-Seq: a revolutionary tool for transcriptomics , 2009, Nature Reviews Genetics.

[56]  Hugues Bersini,et al.  Batch effect removal methods for microarray gene expression data integration: a survey , 2013, Briefings Bioinform..

[57]  Nicholas Stone,et al.  Current trends in machine-learning methods applied to spectroscopic cancer diagnosis , 2014 .

[58]  Jean YH Yang,et al.  Bioconductor: open software development for computational biology and bioinformatics , 2004, Genome Biology.

[59]  Yuanjie Zheng,et al.  Breast Cancer Multi-classification from Histopathological Images with Structured Deep Learning Model , 2017, Scientific Reports.

[60]  Hugues Bersini,et al.  inSilicoDb: an R/Bioconductor package for accessing human Affymetrix expert-curated datasets from GEO , 2011, Bioinform..

[61]  Sean R. Davis,et al.  GEOquery: a bridge between the Gene Expression Omnibus (GEO) and BioConductor , 2007, Bioinform..

[62]  Michael R Hamblin,et al.  CA : A Cancer Journal for Clinicians , 2011 .

[63]  M. Wakefield,et al.  Seven‐year trends in sun protection and sunburn among Australian adolescents and adults , 2013, Australian and New Zealand journal of public health.

[64]  Bareqa Salah,et al.  Skin Cancer Recognition by Using a Neuro-Fuzzy System , 2011, Cancer informatics.

[65]  Terence P. Speed,et al.  A comparison of normalization methods for high density oligonucleotide array data based on variance and bias , 2003, Bioinform..

[66]  Oyas Wahyunggoro,et al.  A hybrid feature selection method using multiclass SVM for diagnosis of erythemato-squamous disease , 2017 .

[67]  William Stafford Noble,et al.  Support vector machine , 2013 .

[68]  V. Cyrilraj,et al.  An innovative hybrid mathematical hierarchical regression model for breast cancer diseases analysis , 2018, Cluster Computing.

[69]  Ricardo Martínez,et al.  GenMiner: mining non-redundant association rules from integrated gene expression data and annotations , 2008, Bioinform..

[70]  Miguel A. Andrade-Navarro,et al.  Gene Set to Diseases (GS2D): disease enrichment analysis on human gene sets with literature data , 2016 .

[71]  Francesco Bianconi,et al.  Multi-class texture analysis in colorectal cancer histology , 2016, Scientific Reports.

[72]  Bruce K Armstrong,et al.  Risk prediction models for incident primary cutaneous melanoma: a systematic review. , 2014, JAMA dermatology.

[73]  J. Bishop Molecular themes in oncogenesis , 1991, Cell.

[74]  Anant Madabhushi,et al.  Cascaded multi-class pairwise classifier (CascaMPa) for normal, cancerous, and cancer confounder classes in prostate histology , 2011, 2011 IEEE International Symposium on Biomedical Imaging: From Nano to Macro.

[75]  S. Priya,et al.  Nuclear segmentation for skin cancer diagnosis from histopathological images , 2015, 2015 Global Conference on Communication Technologies (GCCT).

[76]  I. Nookaew,et al.  A comprehensive comparison of RNA-Seq-based transcriptome analysis from reads to differential gene expression and cross-comparison with microarrays: a case study in Saccharomyces cerevisiae , 2012, Nucleic acids research.

[77]  Greta M. Massetti,et al.  CDC Grand Rounds: Prevention and Control of Skin Cancer , 2016, American journal of transplantation : official journal of the American Society of Transplantation and the American Society of Transplant Surgeons.

[78]  Audrey Kauffmann,et al.  Bioinformatics Applications Note Arrayqualitymetrics—a Bioconductor Package for Quality Assessment of Microarray Data , 2022 .

[79]  Pan Du,et al.  lumi: a pipeline for processing Illumina microarray , 2008, Bioinform..

[80]  Mitch Leslie,et al.  The age of cancer. , 2006, Science of aging knowledge environment : SAGE KE.

[81]  Adel Al-Jumaily,et al.  The Beneficial Techniques in Preprocessing Step of Skin Cancer Detection System Comparing , 2014 .

[82]  J. Carucci,et al.  Gene expression profiling of the leading edge of cutaneous squamous cell carcinoma (SCC): IL-24 driven MMP-7 , 2013, The Journal of investigative dermatology.

[83]  Andrew Rutherford,et al.  Introducing Anova and Ancova: A Glm Approach , 2000 .

[84]  Jan Hellemans,et al.  How to do successful gene expression analysis using real-time PCR. , 2010, Methods.

[85]  Ajeet Kumar,et al.  GLCM and Multi Class Support vector machine based automated skin cancer classification , 2014, 2014 International Conference on Computing for Sustainable Global Development (INDIACom).

[86]  I. Pastushenko,et al.  Skin Cancer Incidence and Mortality in Spain: A Systematic Review and Meta-Analysis. , 2016, Actas dermo-sifiliograficas.