A novel filter feature selection method for paired microarray expression data analysis

In recent years, a large amount of microarray data sets are produced with tens of thousands of genes. Feature selection has become a very sharp tool to select the informative genes. However, few feature selection methods consider the effect of paired samples, which are much more considered in the experiments of these years. Here, we propose a new feature selection method for paired microarray data sets analysis. It uses the fold change instead of the subtraction in the original approach, measures the statistical significant using the q-value of False Discovery Rate (FDR) and also decreases the influence of redundant genes. We compare the proposed method with another six existing methods in predict performance, stability of gene lists, functional stability and functional enrichment analysis using six kinds of paired cancer data sets. Comparison results show that our proposed method achieves better effectiveness, stability and consistency when it is applied to paired data sets.

[1]  S. Dudoit,et al.  Microarray expression profiling identifies genes with altered expression in HDL-deficient mice. , 2000, Genome research.

[2]  K. Chou,et al.  Identification of Colorectal Cancer Related Genes with mRMR and Shortest Path in Protein-Protein Interaction Network , 2012, PloS one.

[3]  J. Rodgers,et al.  Thirteen ways to look at the correlation coefficient , 1988 .

[4]  Dennis B. Troup,et al.  NCBI GEO: archive for high-throughput functional genomic data , 2008, Nucleic Acids Res..

[5]  John D. Storey,et al.  Statistical significance for genomewide studies , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[6]  Taesung Park,et al.  Robust imputation method for missing values in microarray data , 2007, BMC Bioinformatics.

[7]  Xuegong Zhang,et al.  Recursive SVM feature selection and sample classification for mass-spectrometry and microarray data , 2006, BMC Bioinformatics.

[8]  T. Ørntoft,et al.  Repression of KIAA1199 attenuates Wnt-signalling and decreases the proliferation of colon cancer cells , 2011, British Journal of Cancer.

[9]  J. Mesirov,et al.  Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. , 1999, Science.

[10]  V. Kuznetsov,et al.  A robust tool for discriminative analysis and feature selection in paired samples impacts the identification of the genes essential for reprogramming lung tissue to adenocarcinoma , 2011, BMC Genomics.

[11]  Sejong Oh,et al.  RFS: Efficient feature selection method based on R-value , 2013, Comput. Biol. Medicine.

[12]  W. Jiang,et al.  KIAA1199 and its biological role in human cancer and cancer cells (review). , 2014, Oncology reports.

[13]  A. Dufour,et al.  Unraveling the role of KIAA1199, a novel endoplasmic reticulum protein, in cancer cell migration. , 2013, Journal of the National Cancer Institute.

[14]  Jian Pei,et al.  A rank sum test method for informative gene discovery , 2004, KDD.

[15]  Hisashi Narimatsu,et al.  Identification of epigenetic silencing of GCNT2 expression by comprehensive real-time PCR screening in colorectal cancer. , 2014 .

[16]  Satoru Miyano,et al.  A between-Class Overlapping Filter-Based Method for transcriptome Data Analysis , 2012, J. Bioinform. Comput. Biol..

[17]  Pedro Larrañaga,et al.  A review of feature selection techniques in bioinformatics , 2007, Bioinform..

[18]  C. Deves,et al.  Analysis of select members of the E26 (ETS) transcription factors family in colorectal cancer , 2011, Virchows Archiv.

[19]  Zili Zhang,et al.  A multi-filter enhanced genetic ensemble system for gene selection and sample classification of microarray data , 2010, BMC Bioinformatics.

[20]  Yudong D. He,et al.  Gene expression profiling predicts clinical outcome of breast cancer , 2002, Nature.

[21]  T. Barrette,et al.  Oncomine 3.0: genes, pathways, and networks in a collection of 18,000 cancer gene expression profiles. , 2007, Neoplasia.

[22]  Yiming Yang,et al.  Analysis of recursive gene selection approaches from microarray data , 2005, Bioinform..

[23]  Kenji Hibi,et al.  Demethylation of the CDH3 gene is frequently detected in advanced colorectal cancer. , 2009, Anticancer research.

[24]  Yoshinori Sugiyama,et al.  KIAA1199, a deafness gene of unknown function, is a new hyaluronan binding protein involved in hyaluronan depolymerization , 2013, Proceedings of the National Academy of Sciences.

[25]  Jaakko Astola,et al.  Comparison of Affymetrix data normalization methods using 6,926 experiments across five array generations , 2009, BMC Bioinformatics.

[26]  Q. Tan,et al.  Feature Selection for Predicting Tumor Metastases in Microarray Experiments using Paired Design , 2007 .

[27]  Martin Vingron,et al.  Variance stabilization applied to microarray data calibration and to the quantification of differential expression , 2002, ISMB.

[28]  Jean-Philippe Vert,et al.  The Influence of Feature Selection Methods on Accuracy, Stability and Interpretability of Molecular Signatures , 2011, PloS one.

[29]  Pavel Soucek,et al.  Differential expression and prognostic role of selected genes in colorectal cancer patients. , 2013, Anticancer research.

[30]  S. Dudoit,et al.  Normalization for cDNA microarray data: a robust composite method addressing single and multiple slide systematic variation. , 2002, Nucleic acids research.

[31]  Chris H. Q. Ding,et al.  Minimum Redundancy Feature Selection from Microarray Gene Expression Data , 2005, J. Bioinform. Comput. Biol..

[32]  Jason Weston,et al.  Gene Selection for Cancer Classification using Support Vector Machines , 2002, Machine Learning.

[33]  Gavin Sherlock,et al.  Implementation of GenePattern within the Stanford Microarray Database , 2008, Nucleic Acids Res..

[34]  M. Katoh,et al.  CLDN23 gene, frequently down-regulated in intestinal-type gastric cancer, is a novel member of CLAUDIN gene family. , 2003, International journal of molecular medicine.

[35]  Yanqing Zhang,et al.  Development of Two-Stage SVM-RFE Gene Selection Strategy for Microarray Expression Data Analysis , 2007, TCBB.

[36]  Jonathan M. Garibaldi,et al.  ArrayMining: a modular web-application for microarray analysis combining ensemble and consensus methods with cross-study normalization , 2009, BMC Bioinformatics.

[37]  Brad T. Sherman,et al.  Extracting Biological Meaning from Large Gene Lists with DAVID , 2009, Current protocols in bioinformatics.