Detection of biomarkers for Hepatocellular Carcinoma using a hybrid univariate gene selection methods

BackgroundDiscovering new biomarkers has a great role in improving early diagnosis of Hepatocellular carcinoma (HCC). The experimental determination of biomarkers needs a lot of time and money. This motivates this work to use in-silico prediction of biomarkers to reduce the number of experiments required for detecting new ones. This is achieved by extracting the most representative genes in microarrays of HCC.ResultsIn this work, we provide a method for extracting the differential expressed genes, up regulated ones, that can be considered candidate biomarkers in high throughput microarrays of HCC. We examine the power of several gene selection methods (such as Pearson’s correlation coefficient, Cosine coefficient, Euclidean distance, Mutual information and Entropy with different estimators) in selecting informative genes. A biological interpretation of the highly ranked genes is done using KEGG (Kyoto Encyclopedia of Genes and Genomes) pathways, ENTREZ and DAVID (Database for Annotation, Visualization, and Integrated Discovery) databases. The top ten genes selected using Pearson’s correlation coefficient and Cosine coefficient contained six genes that have been implicated in cancer (often multiple cancers) genesis in previous studies. A fewer number of genes were obtained by the other methods (4 genes using Mutual information, 3genes using Euclidean distance and only one gene using Entropy). A better result was obtained by the utilization of a hybrid approach based on intersecting the highly ranked genes in the output of all investigated methods. This hybrid combination yielded seven genes (2 genes for HCC and 5 genes in different types of cancer) in the top ten genes of the list of intersected genes.ConclusionsTo strengthen the effectiveness of the univariate selection methods, we propose a hybrid approach by intersecting several of these methods in a cascaded manner. This approach surpasses all of univariate selection methods when used individually according to biological interpretation and the examination of gene expression signal profiles.

[1]  Thibault Helleputte,et al.  Robust biomarker identification for cancer diagnosis with ensemble feature selection methods , 2010, Bioinform..

[2]  Brad T. Sherman,et al.  DAVID: Database for Annotation, Visualization, and Integrated Discovery , 2003, Genome Biology.

[3]  Karan P. Singh,et al.  Theoretical Biology and Medical Modelling , 2007 .

[4]  Liam Paninski,et al.  Estimation of Entropy and Mutual Information , 2003, Neural Computation.

[5]  Korbinian Strimmer,et al.  An empirical Bayes approach to inferring large-scale gene association networks , 2005, Bioinform..

[6]  Alex E. Lash,et al.  Gene Expression Omnibus: NCBI gene expression and hybridization array data repository , 2002, Nucleic Acids Res..

[7]  Ding‐Shinn Chen,et al.  Identification of complement C3a as a candidate biomarker in human chronic hepatitis C and HCV‐related hepatocellular carcinoma using a proteomics approach , 2006, Proteomics.

[8]  Gianluca Bontempi,et al.  minet: A R/Bioconductor Package for Inferring Large Transcriptional Networks Using Mutual Information , 2008, BMC Bioinformatics.

[9]  Tatiana A. Tatusova,et al.  Entrez Gene: gene-centered information at NCBI , 2004, Nucleic Acids Res..

[10]  Zhu-Hong You,et al.  A Novel Hybrid Method of Gene Selection and Its Application on Tumor Classification , 2008, ICIC.

[11]  Marcel J. T. Reinders,et al.  A comparison of univariate and multivariate gene selection techniques for classification of cancer datasets , 2006, BMC Bioinformatics.

[12]  Sung-Bae Cho,et al.  Machine Learning in DNA Microarray Analysis for Cancer Classification , 2003, APBC.

[13]  William Bialek,et al.  Entropy and information in neural spike trains: progress on the sampling problem. , 2003, Physical review. E, Statistical, nonlinear, and soft matter physics.

[14]  Hiroyuki Ogata,et al.  KEGG: Kyoto Encyclopedia of Genes and Genomes , 1999, Nucleic Acids Res..

[15]  Supplemental Information 2: Kyoto Encyclopedia of genes and genomes. , 2022 .

[16]  K. Strimmer,et al.  Statistical Applications in Genetics and Molecular Biology A Shrinkage Approach to Large-Scale Covariance Matrix Estimation and Implications for Functional Genomics , 2011 .

[17]  M. I. Saripan,et al.  A combinatory algorithm of univariate and multivariate gene selection , 2009 .

[18]  Carsten O. Daub,et al.  Estimating mutual information using B-spline functions – an improved similarity measure for analysing gene expression data , 2004, BMC Bioinformatics.

[19]  W. Xiao,et al.  Interference of globin genes with biomarker discovery for allograft rejection in peripheral blood samples. , 2008, Physiological genomics.

[20]  Jean YH Yang,et al.  Bioconductor: open software development for computational biology and bioinformatics , 2004, Genome Biology.

[21]  Frank Emmert-Streib,et al.  Unite and conquer: univariate and multivariate approaches for finding differentially expressed gene sets , 2009, Bioinform..