Emerging translational bioinformatics: Knowledge-guided biomarker identification for cancer diagnostics

Advances in high-throughput genomic and proteomic technology have led to a growing interest in cancer biomarkers. These biomarkers can potentially improve the accuracy of cancer subtype prediction and subsequently, the success of therapy. In this paper, we describe emerging technology for enabling translational bioinformatics by improving biomarker identification. Specifically, we present an application that uses prior knowledge to identify the most biologically relevant gene ranking algorithm. Identification of statistically and biologically relevant biomarkers from high-throughput data can be unreliable due to the nature of the data — e.g., high technical variability, small sample size, and high dimension size. Furthermore, due to the lack of available training samples, data-driven machine learning methods are often insufficient without the support of knowledge-based algorithms. As a case study, we apply these knowledge-driven methods to renal cancer data and identify genes that are potential biomarkers for cancer subtype classification.

[1]  Stephen M. Hewitt,et al.  Post-analysis follow-up and validation of microarray experiments , 2002, Nature Genetics.

[2]  M. Nowicki,et al.  Vascular endothelial growth factor (VEGF‐C1)‐dependent inflammatory response of podocytes in nephrotic syndrome glomerulopathies in children: an immunohistochemical approach , 2005, Histopathology.

[3]  C. James,et al.  GBAS, a novel gene encoding a protein with tyrosine phosphorylation sites and a transmembrane domain, is co-amplified with EGFR. , 1998, Genomics.

[4]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[5]  R. Tibshirani,et al.  Significance analysis of microarrays applied to the ionizing radiation response , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[6]  Todd R. Young,et al.  The Signed Distance Function: A New Tool for Binary Classification , 2005, ArXiv.

[7]  Richard Baumgartner,et al.  Class prediction and discovery using gene microarray and proteomics mass spectroscopy data: curses, caveats, cautions , 2003, Bioinform..

[8]  Stephen J. Roberts,et al.  A Theoretical Analysis of the Selection of Differentially Expressed Genes , 2005, J. Bioinform. Comput. Biol..

[9]  Nello Cristianini,et al.  An Introduction to Support Vector Machines and Other Kernel-based Learning Methods , 2000 .

[10]  May D. Wang,et al.  Improving the Efficiency of Biomarker Identification Using Biological Knowledge , 2008, Pacific Symposium on Biocomputing.

[11]  Edward R. Dougherty,et al.  Is cross-validation valid for small-sample microarray classification? , 2004, Bioinform..

[12]  P. Fu,et al.  Promotion of Cancer Cell Migration , 2007, Journal of Biological Chemistry.

[13]  M. Xiong,et al.  Biomarker Identification by Feature Wrappers , 2022 .

[14]  I. Tomlinson,et al.  Colorectal cancer and genetic alterations in the Wnt pathway , 2006, Oncogene.

[15]  M.D. Wang,et al.  Selecting Clinically-Driven Biomarkers for Cancer Nanotechnology , 2006, 2006 International Conference of the IEEE Engineering in Medicine and Biology Society.

[16]  Qiqin Yin-Goen,et al.  Molecular classification of renal tumors by gene expression profiling. , 2005, The Journal of molecular diagnostics : JMD.

[17]  R. Tibshirani,et al.  Improvements on Cross-Validation: The 632+ Bootstrap Method , 1997 .