Discovering Disease-specific Biomarker Genes for Cancer Diagnosis and Prognosis

The large amounts of microarray data provide us a great opportunity to identify gene expression profiles (GEPs) in different tissues or disease states. Disease-specific biomarker genes likely share GEPs that are distinct in disease samples as compared with normal samples. The similarity of the GEPs may be evaluated by Pearson Correlation Coefficient (PCC) and the distinctness of GEPs may be assessed by Kolmogorov-Smirnov distance (KSD). In this study, we used the PCC and KSD metrics for GEPs to identify disease-specific (cancer-specific) biomarkers. We first analyzed and compared GEPs using microarray datasets for smoking and lung cancer. We found that the number of genes with highly different GEPs between comparing groups in smoking dataset was much larger than that in lung cancer dataset; this observation was further verified when we compared GEPs in smoking dataset with prostate cancer datasets. Moreover, our Gene Ontology analysis revealed that the top ranked biomarker candidate genes for prostate cancer were highly enriched in molecular function categories such as ‘cytoskeletal protein binding’ and biological process categories such as ‘muscle contraction’. Finally, we used two genes, ACTC1 (encoding an actin subunit) and HPN (encoding hepsin), to demonstrate the feasibility of diagnosing and monitoring prostate cancer using the expression intensity histograms of marker genes. In summary, our results suggested that this approach might prove promising and powerful for diagnosing and monitoring the patients who come to the clinic for screening or evaluation of a disease state including cancer.

[1]  C. Blake Hormone receptors. , 1978, Endeavour.

[2]  F. Baas,et al.  The Human Transcriptome Map: Clustering of Highly Expressed Genes in Chromosomal Domains , 2001, Science.

[3]  Adam Ertel,et al.  Switch-like genes populate cell communication pathways and are enriched for extracellular proteins , 2008, BMC Genomics.

[4]  A. Baniahmad,et al.  Gene silencing by the thyroid hormone receptor , 2003, Molecular and Cellular Endocrinology.

[5]  J. Rodgers,et al.  Thirteen ways to look at the correlation coefficient , 1988 .

[6]  M. Wand Data-Based Choice of Histogram Bin Width , 1997 .

[7]  S. Roman,et al.  Adrenocortical carcinoma , 2006, Current opinion in oncology.

[8]  Marc Tessier-Lavigne,et al.  The Hedgehog, TGF-beta/BMP and Wnt families of morphogens in axon guidance. , 2007, Advances in experimental medicine and biology.

[9]  M. Becich,et al.  Gene expression profiles of prostate cancer reveal involvement of multiple molecular pathways in the metastatic process , 2007, BMC Cancer.

[10]  A. Kicheva,et al.  Temporal dynamics of patterning by morphogen gradients. , 2009, Current opinion in genetics & development.

[11]  M. Suzuki Thyroid Hormone Receptor , 2020, Definitions.

[12]  R. Agarwal Smoking, oxidative stress and inflammation: Impact on resting energy expenditure in diabetic nephropathy , 2005, BMC nephrology.

[13]  Diane D. Liu,et al.  Nuclear factor-kappaB (NF-kappaB) is frequently expressed in lung cancer and preneoplastic lesions. , 2006, Cancer.

[14]  Yi-Wei Tang,et al.  Basic Concepts of Microarrays and Potential Applications in Clinical Microbiology , 2009, Clinical Microbiology Reviews.

[15]  Bharat B. Aggarwal,et al.  Nuclear factor‐κB (nf‐κB) is frequently expressed in lung cancer and preneoplastic lesions , 2006 .

[16]  C. Stournaras,et al.  A rapid, nongenomic, signaling pathway regulates the actin reorganization induced by activation of membrane testosterone receptors. , 2003, Molecular endocrinology.

[17]  Jacob Cohen,et al.  Applied multiple regression/correlation analysis for the behavioral sciences , 1979 .

[18]  Thomas D. Wu,et al.  Genome-wide identification of chromosomal regions of increased tumor expression by transcriptome analysis. , 2003, Cancer research.

[19]  A. Reznick,et al.  Mechanisms underlying cigarette smoke-induced NF-kappaB activation in human lymphocytes: the role of reactive nitrogen species. , 2007, Journal of physiology and pharmacology : an official journal of the Polish Physiological Society.

[20]  A. Glas,et al.  Gene expression profiling: decoding breast cancer. , 2009, Surgical oncology.

[21]  Adam Ertel,et al.  Human and mouse switch-like genes share common transcriptional regulatory mechanisms for bimodality , 2008, BMC Genomics.

[22]  Andy Greenfield,et al.  Using DNA microarrays. , 2008, Methods in molecular biology.

[23]  G. Gordon,et al.  A diagnostic test for prostate cancer from gene expression profiling data. , 2004, The Journal of urology.

[24]  G. Lyman,et al.  Gene Expression Profile Assays as Predictors of Distant Recurrence-Free Survival in Early-Stage Breast Cancer , 2009, Cancer investigation.

[25]  Elizabeth Garrett-Mayer,et al.  A simple two-gene prognostic model for adenocarcinoma of the lung. , 2008, The Journal of thoracic and cardiovascular surgery.

[26]  Profiling gene transcription in the developing embryo: microarray analysis on gene chips. , 2008, Methods in molecular biology.

[27]  P. Sebastiani,et al.  Airway epithelial gene expression in the diagnostic evaluation of smokers with suspect lung cancer , 2007, Nature Medicine.

[28]  Bing Zhang,et al.  WebGestalt: an integrated system for exploring gene sets in various biological contexts , 2005, Nucleic Acids Res..

[29]  M. Guerrero,et al.  Gene-expression profiling of adrenocortical carcinoma , 2009, Expert review of molecular diagnostics.

[30]  Y. Asmann,et al.  A Tissue Biomarker Panel Predicting Systemic Progression after PSA Recurrence Post-Definitive Prostate Cancer Therapy , 2008, PloS one.

[31]  M. Spitz,et al.  Cell cycle checkpoints, DNA damage/repair, and lung cancer risk. , 2005, Cancer research.

[32]  Li Jin,et al.  Variants in the HEPSIN gene are associated with prostate cancer in men of European origin , 2006, Human Genetics.

[33]  M. Stephens EDF Statistics for Goodness of Fit and Some Comparisons , 1974 .

[34]  Dennis B. Troup,et al.  NCBI GEO: archive for high-throughput functional genomic data , 2008, Nucleic Acids Res..

[35]  D. Bernhard,et al.  An Evaluation of the Clinical Evidence on the Role of Inflammation and Oxidative Stress in Smoking-Mediated Cardiovascular Disease , 2008, Biomarker insights.

[36]  Gang Liu,et al.  Effects of cigarette smoke on the human airway epithelial cell transcriptome. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[37]  D. Jupiter,et al.  Classification of Genes and Putative Biomarker Identification Using Distribution Metrics on Expression Profiles , 2010, PloS one.

[38]  W. Wong,et al.  Modeling the spatio-temporal network that drives patterning in the vertebrate central nervous system. , 2009, Biochimica et biophysica acta.

[39]  Alex E. Lash,et al.  Gene Expression Omnibus: NCBI gene expression and hybridization array data repository , 2002, Nucleic Acids Res..

[40]  A. Baniahmad,et al.  Gene repression by nuclear hormone receptors. , 2004, Essays in biochemistry.

[41]  Dennis B. Troup,et al.  NCBI GEO: mining tens of millions of expression profiles—database and tools update , 2006, Nucleic Acids Res..

[42]  R. Tibshirani,et al.  Significance analysis of microarrays applied to the ionizing radiation response , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[43]  Joseph Beyene,et al.  Using the ratio of means as the effect size measure in combining results of microarray experiments , 2009, BMC Systems Biology.

[44]  Francisco Azuaje,et al.  An assessment of recently published gene expression data analyses: reporting experimental design and statistical factors , 2006, BMC Medical Informatics Decis. Mak..