DSP based entropy estimation for identification and classification of Homo sapiens cancer genes

Because of advancement of microarray technology in recent years, public domains like NCBI, NIH, NHGRI etc. present extensive range of information rich raw genomic data. Effortless accessibility of these data attracts the researchers from diverse disciplines to process them for the benefit of the society. In the field of signal processing, a new area of research has been introduced namely genomic signal processing (GSP). GSP basically processes genes, proteins and DNA sequences using various signal processing methodologies to extract the information hidden in it. As some genetic abnormalities turn into cancer diseases, proper understanding and analysis of genes and proteins may lead to a new horizon in cancer genomic study. In genomic signal processing, exact identification and classification of diseased gene is a great challenge to the researchers. Hence in the present paper, the crucial job of gene identification and classification is attempted. As a solution to this problem, statistical methods like entropy estimation and mutual information calculation is adopted along with DSP technique. Rayleigh distribution of estimated entropy of gene is treated as identifier of healthy and cancerous Homo sapiens. Once the cancer genes are identified, mutual information estimator based on their minimum entropy is used as classifier to detect different types of cancer genes. The present algorithms are successfully tested on several healthy and cancerous prostate, breast and colon genes collected from NCBI genbank.

[1]  R. Voss,et al.  Evolution of long-range fractal correlations and 1/f noise in DNA base sequences. , 1992, Physical review letters.

[2]  William Bruce Sherwin,et al.  Entropy and Information Approaches to Genetic Diversity and its Expression: Genomic Geography , 2010, Entropy.

[3]  Wen Huang,et al.  MTML-msBayes: Approximate Bayesian comparative phylogeographic inference from multiple taxa and multiple loci with rate heterogeneity , 2011, BMC Bioinformatics.

[4]  Michael R. Lyu,et al.  Gene Selection Based on Mutual Information for the Classification of Multi-class Cancer , 2006, ICIC.

[5]  P. P. Vaidyanathan Genomics and Proteomics: A Signal Processor's Tour , 2004 .

[6]  Sherin Mariam John,et al.  Mutual Information-Based Supervised Attribute Clustering for Large Microarray Sample Classification , 2013 .

[7]  S. Buldyrev,et al.  Species independence of mutual information in coding and noncoding DNA. , 2000, Physical review. E, Statistical physics, plasmas, fluids, and related interdisciplinary topics.

[8]  Jorge S. Reis-Filho,et al.  Microarray-Based Class Discovery for Molecular Classification of Breast Cancer: Analysis of Interobserver Agreement , 2011, Journal of the National Cancer Institute.

[9]  Olli Yli-Harja,et al.  Novel Data Fusion Method and Exploration of Multiple Information Sources for Transcription Factor Target Gene Prediction , 2010, EURASIP J. Adv. Signal Process..

[10]  Gianluca Bontempi,et al.  On the Impact of Entropy Estimation on Transcriptional Regulatory Network Inference Based on Mutual Information , 2008, EURASIP J. Bioinform. Syst. Biol..

[11]  Xuefeng Bruce Ling,et al.  Multiclass cancer classification and biomarker discovery using GA-based algorithms , 2005, Bioinform..

[12]  R. Doolittle,et al.  A simple method for displaying the hydropathic character of a protein. , 1982, Journal of molecular biology.

[13]  I S Kohane,et al.  Mutual information relevance networks: functional genomic clustering using pairwise entropy measurements. , 1999, Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing.

[14]  H. Horvitz,et al.  MicroRNA expression profiles classify human cancers , 2005, Nature.

[15]  Jorge Stolfi,et al.  Mutual Information Content of Homologous DNA Sequences , 2004, WOB.

[16]  Zhiyong Lu,et al.  Benchmarking of the 2010 BioCreative Challenge III text-mining competition by the BioGRID and MINT interaction databases , 2011 .

[17]  Wei Du,et al.  Molecular classification of cancer types from microarray data using the combination of genetic algorithms and support vector machines , 2003, FEBS letters.

[18]  Chih Lee,et al.  Ensemble learning algorithms for classification of mtDNA into haplogroups , 2011, Briefings Bioinform..

[19]  Soma Barman,et al.  Performance Analysis of Network Model to Identify Healthy and Cancerous Colon Genes , 2016, IEEE Journal of Biomedical and Health Informatics.

[20]  C I Amos,et al.  Entropy‐based information gain approaches to detect and to characterize gene‐gene and gene‐environment interactions/correlations of complex diseases , 2011, Genetic epidemiology.

[21]  J. Oliver,et al.  Sequence Compositional Complexity of DNA through an Entropic Segmentation Method , 1998 .

[22]  Richard Simon,et al.  Microarray-based cancer prediction using single genes , 2011, BMC Bioinformatics.

[23]  R. Coppel,et al.  Fusion of FNA-cytology and Gene-expression Data Using Dempster-Shafer Theory of Evidence to Predict Breast Cancer Tumors , 2006, Bioinformation.

[24]  Dmitri Parkhomchuk Di-nucleotide Entropy as a Measure of Genomic Sequence Functionality , 2006 .

[25]  Lawrence Carin,et al.  Bayesian joint analysis of heterogeneous genomics data , 2014, Bioinform..

[26]  Tenreiro Machado,et al.  Shannon Entropy Analysis of the Genome Code , 2012 .

[27]  P. Galle,et al.  Microarray-Based Gene Expression Analysis of Hepatocellular Carcinoma , 2010, Current genomics.

[28]  En-Hui Yang,et al.  Estimating DNA sequence entropy , 2000, SODA '00.

[29]  Roberto Garello,et al.  The Minimum Entropy Mapping Spectrum of a DNA Sequence , 2010, IEEE Transactions on Information Theory.

[30]  Soma Barman,et al.  A behavioral study of healthy and cancer genes by modeling electrical network. , 2014, Gene.

[31]  Peng Qiu,et al.  Fast calculation of pairwise mutual information for gene regulatory network reconstruction , 2009, Comput. Methods Programs Biomed..

[32]  P. P. Vaidyanathan,et al.  The role of signal-processing concepts in genomics and proteomics , 2004, J. Frankl. Inst..

[33]  Dimitris Anastassiou,et al.  Genomic signal processing , 2001, IEEE Signal Process. Mag..

[34]  J. R. Arias-Gonzalez Entropy Involved in Fidelity of DNA Replication , 2012, PloS one.

[35]  Serap A. Savari,et al.  On the entropy of DNA: algorithms and measurements based on memory and rapid convergence , 1995, SODA '95.

[36]  Ahmad M. Sarhan,et al.  Journal of Theoretical and Applied Information Technology Cancer Classification Based on Microarray Gene Expression Data Using Dct and Ann , 2022 .