A Framework for the Automatic Combination and Evaluation of Gene Selection Methods

High-throughput RNA-Sequencing technologies produce large gene expression datasets whose analysis leads to a better understanding and treatment of diseases like cancer. The data’s high dimensionality poses challenges to its computational analysis, which is addressed by applying gene selection. Traditional gene selection methods are based on the data only. In turn, integrative approaches include curated biological information from external knowledge bases in the gene selection process, which improves result accuracy and computational complexity.

[1]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[2]  The Gene Ontology Consortium,et al.  Expansion of the Gene Ontology knowledgebase and resources , 2016, Nucleic Acids Res..

[3]  Verónica Bolón-Canedo,et al.  A review of feature selection methods on synthetic data , 2013, Knowledge and Information Systems.

[4]  Allison P. Heath,et al.  Toward a Shared Vision for Cancer Genomic Data. , 2016, The New England journal of medicine.

[5]  Kimberly R. Kukurba,et al.  RNA Sequencing and Analysis. , 2015, Cold Spring Harbor protocols.

[6]  Ramón Díaz-Uriarte,et al.  Gene selection and classification of microarray data using random forest , 2006, BMC Bioinformatics.

[7]  Jun Huan,et al.  Biological pathways as features for microarray data classification , 2008, DTMBIO '08.

[8]  Jian Tang,et al.  Integrating gene ontology into discriminative powers of genes for feature selection in microarray data , 2007, SAC '07.

[9]  Douglas M. Hawkins,et al.  A variance-stabilizing transformation for gene-expression microarray data , 2002, ISMB.

[10]  Patrick Tan,et al.  Genetic algorithms applied to multi-class prediction for the analysis of gene expression data , 2003, Bioinform..

[11]  Verónica Bolón-Canedo,et al.  A review of microarray datasets and applied feature selection methods , 2014, Inf. Sci..

[12]  Susumu Goto,et al.  KEGG: Kyoto Encyclopedia of Genes and Genomes , 2000, Nucleic Acids Res..

[13]  Jason Weston,et al.  Gene Selection for Cancer Classification using Support Vector Machines , 2002, Machine Learning.

[14]  Huan Liu,et al.  An Integrative Approach to Indentifying Biologically Relevant Genes , 2010, SDM.

[15]  Joshua M. Stuart,et al.  The Cancer Genome Atlas Pan-Cancer analysis project , 2013, Nature Genetics.

[16]  Md Nasir Sulaiman,et al.  An integrative gene selection with association analysis for microarray data classification , 2014, Intell. Data Anal..

[17]  Huan Liu,et al.  Feature Selection for Classification , 1997, Intell. Data Anal..

[18]  Yike Guo,et al.  Enabling more sophisticated gene expression analysis for understanding diseases and optimizing treatments , 2007, SKDD.

[19]  Blaz Zupan,et al.  Towards knowledge-based gene expression data mining , 2007, J. Biomed. Informatics.

[20]  Lei Liu,et al.  Ensemble gene selection by grouping for microarray data classification , 2010, J. Biomed. Informatics.

[21]  Chris H. Q. Ding,et al.  Minimum Redundancy Feature Selection from Microarray Gene Expression Data , 2005, J. Bioinform. Comput. Biol..

[22]  Sudipta Acharya,et al.  Unsupervised gene selection using biological knowledge : application in sample clustering , 2017, BMC Bioinformatics.

[23]  Panos K. Chrysanthis,et al.  Integrated Theory-and Data-Driven Feature Selection in Gene Expression Data Analysis , 2017, 2017 IEEE 33rd International Conference on Data Engineering (ICDE).

[24]  Habibollah Haron,et al.  Supervised, Unsupervised, and Semi-Supervised Feature Selection: A Review on Gene Selection , 2016, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[25]  J.C. Rajapakse,et al.  SVM-RFE With MRMR Filter for Gene Selection , 2010, IEEE Transactions on NanoBioscience.

[26]  Núria Queralt-Rosinach,et al.  DisGeNET: a discovery platform for the dynamical exploration of human diseases and their genes , 2015, Database J. Biol. Databases Curation.

[27]  Yukyee Leung,et al.  A Multiple-Filter-Multiple-Wrapper Approach to Gene Selection and Microarray Data Classification , 2010, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[28]  Igor Kononenko,et al.  Estimating Attributes: Analysis and Extensions of RELIEF , 1994, ECML.

[29]  Feng Yang,et al.  Robust Feature Selection for Microarray Data Based on Multicriterion Fusion , 2011, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[30]  Pericles A. Mitkas,et al.  SoFoCles: Feature filtering for microarray classification based on Gene Ontology , 2010, J. Biomed. Informatics.

[31]  Satoru Miyano,et al.  A Top-r Feature Selection Algorithm for Microarray Gene Expression Data , 2012, IEEE/ACM Transactions on Computational Biology and Bioinformatics.