A feature selection approach for identification of signature genes from SAGE data

BackgroundOne goal of gene expression profiling is to identify signature genes that robustly distinguish different types or grades of tumors. Several tumor classifiers based on expression profiling have been proposed using microarray technique. Due to important differences in the probabilistic models of microarray and SAGE technologies, it is important to develop suitable techniques to select specific genes from SAGE measurements.ResultsA new framework to select specific genes that distinguish different biological states based on the analysis of SAGE data is proposed. The new framework applies the bolstered error for the identification of strong genes that separate the biological states in a feature space defined by the gene expression of a training set. Credibility intervals defined from a probabilistic model of SAGE measurements are used to identify the genes that distinguish the different states with more reliability among all gene groups selected by the strong genes method. A score taking into account the credibility and the bolstered error values in order to rank the groups of considered genes is proposed. Results obtained using SAGE data from gliomas are presented, thus corroborating the introduced methodology.ConclusionThe model representing counting data, such as SAGE, provides additional statistical information that allows a more robust analysis. The additional statistical information provided by the probabilistic model is incorporated in the methodology described in the paper. The introduced method is suitable to identify signature genes that lead to a good separation of the biological states using SAGE and may be adapted for other counting methods such as Massive Parallel Signature Sequencing (MPSS) or the recent Sequencing-By-Synthesis (SBS) technique. Some of such genes identified by the proposed method may be useful to generate classifiers.

[1]  Ulisses Braga-Neto,et al.  Bolstered error estimation , 2004, Pattern Recognit..

[2]  O. Kallioniemi,et al.  Identification of differentially expressed genes in human gliomas by DNA microarray and tissue chip techniques. , 2000, Cancer research.

[3]  D. Xie,et al.  TMEFF1 and brain tumors , 2003, Oncogene.

[4]  A. Ljubimov,et al.  Overexpression of alpha4 chain-containing laminins in human glial tumors identified by gene microarray analysis. , 2001, Cancer research.

[5]  Tanja Woyke,et al.  Gene expression profile of glioblastoma multiforme invasive phenotype points to new therapeutic targets. , 2005, Neoplasia.

[6]  Christian A. Rees,et al.  Distinctive gene expression patterns in human mammary epithelial cells and breast cancers. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[7]  Edward R. Dougherty,et al.  Small Sample Issues for Microarray-Based Classification , 2001, Comparative and functional genomics.

[8]  David E. Misek,et al.  Distinctive molecular profiles of high-grade and low-grade gliomas based on oligonucleotide microarray analysis. , 2001, Cancer research.

[9]  David E. Misek,et al.  Characterization of gene expression profiles associated with glioma progression using oligonucleotide-based microarray analysis and real-time reverse transcription-polymerase chain reaction. , 2003, The American journal of pathology.

[10]  Anil K. Jain,et al.  Statistical Pattern Recognition: A Review , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[11]  D. Botstein,et al.  Gene expression profiling reveals molecularly and clinically distinct subtypes of glioblastoma multiforme. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[12]  Pablo Rodriguez-Viciana,et al.  A phosphatase holoenzyme comprised of Shoc2/Sur8 and the catalytic subunit of PP1 functions as an M-Ras effector to modulate Raf activity. , 2006, Molecular cell.

[13]  Michael L. Bittner,et al.  Strong Feature Sets from Small Samples , 2002, J. Comput. Biol..

[14]  Ji Huang,et al.  [Serial analysis of gene expression]. , 2002, Yi chuan = Hereditas.

[15]  Christian A. Rees,et al.  Molecular portraits of human breast tumours , 2000, Nature.

[16]  M. Berens,et al.  The Tumor Necrosis Factor-like Weak Inducer of Apoptosis (TWEAK)-Fibroblast Growth Factor-inducible 14 (Fn14) Signaling System Regulates Glioma Cell Survival via NFκB Pathway Activation and BCL-XL/BCL-W Expression* , 2005, Journal of Biological Chemistry.

[17]  Eytan Domany,et al.  Outcome signature genes in breast cancer: is there a unique set? , 2004, Breast Cancer Research.

[18]  J. Mesirov,et al.  Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. , 1999, Science.

[19]  Paul S Mischel,et al.  Gene expression profiling identifies molecular subtypes of gliomas , 2003, Oncogene.

[20]  H Aburatani,et al.  Direct comparison of GeneChip and SAGE on the quantitative accuracy in transcript profiling analysis. , 2000, Genomics.

[21]  Edward R. Dougherty,et al.  Feature selection algorithms to find strong genes , 2005, Pattern Recognit. Lett..

[22]  Stuart G. Baker,et al.  Identifying genes that contribute most to good classification in microarrays , 2006, BMC Bioinformatics.

[23]  E. Dougherty,et al.  Identification of combination gene sets for glioma classification. , 2002, Molecular cancer therapeutics.

[24]  Ricardo Z. N. Vêncio,et al.  Using credibility intervals instead of hypothesis tests in SAGE analysis , 2003, Bioinform..

[25]  Kenneth H Buetow,et al.  An anatomy of normal and malignant gene expression , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[26]  Eytan Domany,et al.  Classification of human astrocytic gliomas on the basis of gene expression: a correlated group of genes with angiogenic activity emerges as a strong predictor of subtypes. , 2003, Cancer research.

[27]  D. Figarella-Branger,et al.  Identification of genes differentially expressed in glioblastoma versus pilocytic astrocytoma using Suppression Subtractive Hybridization , 2006, Oncogene.