Improving the Efficiency of Biomarker Identification Using Biological Knowledge

Identifying and validating biomarkers from high-throughput gene expression data is important for understanding and treating cancer. Typically, we identify candidate biomarkers as features that are differentially expressed between two or more classes of samples. Many feature selection metrics rely on ranking by some measure of differential expression. However, interpreting these results is difficult due to the large variety of existing algorithms and metrics, each of which may produce different results. Consequently, a feature ranking metric may work well on some datasets but perform considerably worse on others. We propose a method to choose an optimal feature ranking metric on an individual dataset basis. A metric is optimal if, for a particular dataset, it favorably ranks features that are known to be relevant biomarkers. Extensive knowledge of biomarker candidates is available in public databases and literature. Using this knowledge, we can choose a ranking metric that produces the most biologically meaningful results. In this paper, we first describe a framework for assessing the ability of a ranking metric to detect known relevant biomarkers. We then apply this method to clinical renal cancer microarray data to choose an optimal metric and identify several candidate biomarkers.

[1]  Qiqin Yin-Goen,et al.  Molecular classification of renal tumors by gene expression profiling. , 2005, The Journal of molecular diagnostics : JMD.

[2]  A. Rosendahl,et al.  IGF-I and IGFBP-3 augment transforming growth factor-beta actions in human renal carcinoma cells. , 2006, Kidney international.

[3]  Todd H. Stokes,et al.  chip artifact CORRECTion (caCORRECT): A Bioinformatics System for Quality Assurance of Genomics and Proteomics Array Data , 2007, Annals of Biomedical Engineering.

[4]  Stephen M. Hewitt,et al.  Post-analysis follow-up and validation of microarray experiments , 2002, Nature Genetics.

[5]  Bassem A. Hassan,et al.  Gene prioritization through genomic data fusion , 2006, Nature Biotechnology.

[6]  Peter J. Park,et al.  A multivariate approach for integrating genome-wide expression data and biological knowledge , 2006, Bioinform..

[7]  F. V. Van Dolah,et al.  Microarray validation: factors influencing correlation between oligonucleotide microarrays and real-time PCR , 2006, Biological Procedures Online.

[8]  R. Tibshirani,et al.  Improvements on Cross-Validation: The 632+ Bootstrap Method , 1997 .

[9]  M. Probst-Kepper,et al.  CXCR4/CXCL12 expression and signalling in kidney cancer , 2002, British Journal of Cancer.

[10]  Blaz Zupan,et al.  Towards knowledge-based gene expression data mining , 2007, J. Biomed. Informatics.

[11]  E. Lander,et al.  Gene expression correlates of clinical prostate cancer behavior. , 2002, Cancer cell.

[12]  Marie Joseph,et al.  Gene Signatures of Progression and Metastasis in Renal Cell Cancer , 2005, Clinical Cancer Research.

[13]  M. Lerman,et al.  Two novel VHL targets, TGFBI (BIGH3) and its transactivator KLF10, are up-regulated in renal clear cell carcinoma and other tumors. , 2008, Biochemical and biophysical research communications.

[14]  J. Mesirov,et al.  Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. , 1999, Science.

[15]  Ralf Zimmer,et al.  Expert knowledge without the expert: integrated analysis of gene expression and literature to derive active functional contexts , 2005, ECCB/JBI.

[16]  Yudong D. He,et al.  Gene expression profiling predicts clinical outcome of breast cancer , 2002, Nature.

[17]  Edward R. Dougherty,et al.  Is cross-validation valid for small-sample microarray classification? , 2004, Bioinform..

[18]  Maqc Consortium The MicroArray Quality Control (MAQC) project shows inter- and intraplatform reproducibility of gene expression measurements , 2006, Nature Biotechnology.

[19]  Stephen J. Roberts,et al.  A Theoretical Analysis of the Selection of Differentially Expressed Genes , 2005, J. Bioinform. Comput. Biol..