A comparative study of improvements Pre-filter methods bring on feature selection using microarray data

BackgroundFeature selection techniques have become an apparent need in biomarker discoveries with the development of microarray. However, the high dimensional nature of microarray made feature selection become time-consuming. To overcome such difficulties, filter data according to the background knowledge before applying feature selection techniques has become a hot topic in microarray analysis. Different methods may affect final results greatly, thus it is important to evaluate these pre-filter methods in a system way.MethodsIn this paper, we compared the performance of statistical-based, biological-based pre-filter methods and the combination of them on microRNA-mRNA parallel expression profiles using L1 logistic regression as feature selection techniques. Four types of data were built for both microRNA and mRNA expression profiles.ResultsResults showed that pre-filter methods could reduce the number of features greatly for both mRNA and microRNA expression datasets. The features selected after pre-filter procedures were shown to be significant in biological levels such as biology process and microRNA functions. Analyses of classification performance based on precision showed the pre-filter methods were necessary when the number of raw features was much bigger than that of samples. All the computing time was greatly shortened after pre-filter procedures.ConclusionsWith similar or better classification improvements, less but biological significant features, pre-filter-based feature selection should be taken into consideration if researchers need fast results when facing complex computing problems in bioinformatics.

[1]  Hongzhe Li,et al.  In Response to Comment on "Network-constrained regularization and variable selection for analysis of genomic data" , 2008, Bioinform..

[2]  D. DeMets,et al.  Biomarkers and surrogate endpoints: Preferred definitions and conceptual framework , 2001, Clinical pharmacology and therapeutics.

[3]  Susumu Goto,et al.  Data, information, knowledge and principle: back to metabolism in KEGG , 2013, Nucleic Acids Res..

[4]  Alex E. Lash,et al.  Gene Expression Omnibus: NCBI gene expression and hybridization array data repository , 2002, Nucleic Acids Res..

[5]  Francisco Azuaje,et al.  An assessment of recently published gene expression data analyses: reporting experimental design and statistical factors , 2006, BMC Medical Informatics Decis. Mak..

[6]  I. Ellis,et al.  A consensus prognostic gene expression classifier for ER positive breast cancer , 2006, Genome Biology.

[7]  Ramón Díaz-Uriarte,et al.  Gene selection and classification of microarray data using random forest , 2006, BMC Bioinformatics.

[8]  Pedro Larrañaga,et al.  Filter versus wrapper gene selection approaches in DNA microarray domains , 2004, Artif. Intell. Medicine.

[9]  N. Campbell Genetic association database , 2004, Nature Reviews Genetics.

[10]  Ariel Linden Measuring diagnostic and predictive accuracy in disease management: an introduction to receiver operating characteristic (ROC) analysis. , 2006, Journal of evaluation in clinical practice.

[11]  Wengang Zhou,et al.  A novel class dependent feature selection method for cancer biomarker discovery , 2014, Comput. Biol. Medicine.

[12]  J. Dorado,et al.  Machine Learning Techniques for Single Nucleotide Polymorphism—Disease Classification Models in Schizophrenia , 2010, Molecules.

[13]  Yunpeng Cai,et al.  A comparative study of improvements Pre-filter methods bring on feature selection using microarray data , 2014, Health Information Science and Systems.

[14]  Chi-Ying F. Huang,et al.  miRTarBase: a database curates experimentally validated microRNA–target interactions , 2010, Nucleic Acids Res..

[15]  M. Civelek,et al.  MicroRNA-10a regulation of proinflammatory phenotype in athero-susceptible endothelium in vivo and in vitro , 2010, Proceedings of the National Academy of Sciences.

[16]  Wei Pan,et al.  BIOINFORMATICS ORIGINAL PAPER doi:10.1093/bioinformatics/btm612 Systems biology , 2022 .

[17]  J. Stuart Aitken,et al.  Feature selection and classification for microarray data analysis: Evolutionary methods for identifying predictive genes , 2005, BMC Bioinformatics.

[18]  Jingai Zhu,et al.  microRNA expression profiling of the developing mouse heart. , 2012, International journal of molecular medicine.

[19]  Henning Hermjakob,et al.  The Reactome pathway Knowledgebase , 2015, Nucleic acids research.

[20]  Taneth Ruangrajitpakorn,et al.  Biomarker Selection and Classification of “-Omics” Data Using a Two-Step Bayes Classification Framework , 2013, BioMed research international.

[21]  Li Wang,et al.  CGI: a new approach for prioritizing genes by combining gene expression and protein-protein interaction data , 2007, Bioinform..

[22]  L. Kunkel,et al.  Correction for Eisenberg et al., Distinctive patterns of microRNA expression in primary muscular disorders , 2008, Proceedings of the National Academy of Sciences.

[23]  D. Huntsman,et al.  p53 is positively regulated by miR-542-3p. , 2014, Cancer research.

[24]  Kenneth H. Buetow,et al.  PID: the Pathway Interaction Database , 2008, Nucleic Acids Res..

[25]  Danish Sayed,et al.  MicroRNAs Play an Essential Role in the Development of Cardiac Hypertrophy , 2007, Circulation research.

[26]  Hongzhe Li,et al.  A Markov random field model for network-based analysis of genomic data , 2007, Bioinform..

[27]  Pedro Larrañaga,et al.  A review of feature selection techniques in bioinformatics , 2007, Bioinform..

[28]  Victor Trevino,et al.  Compact cancer biomarkers discovery using a swarm intelligence feature selection algorithm , 2010, Comput. Biol. Chem..

[29]  Bassem A. Hassan,et al.  Gene prioritization through genomic data fusion , 2006, Nature Biotechnology.

[30]  Shi-Hua Zhang,et al.  Detecting disease associated modules and prioritizing active genes based on high throughput data , 2010, BMC Bioinformatics.

[31]  Seongkyu Yoon,et al.  Robustness of chemometrics-based feature selection methods in early cancer detection and biomarker discovery , 2013, Statistical applications in genetics and molecular biology.

[32]  Jian Li,et al.  Fast Implementation of ℓ1Regularized Learning Algorithms Using Gradient Descent Methods , 2010, SDM.

[33]  H. Katus,et al.  Wnt Signaling Is Critical for Maladaptive Cardiac Hypertrophy and Accelerates Myocardial Remodeling , 2010, Hypertension.

[34]  Emmanuel Barillot,et al.  Classification of microarray data using gene networks , 2007, BMC Bioinformatics.

[35]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[36]  Melanie Hilario,et al.  Approaches to dimensionality reduction in proteomic biomarker studies , 2007, Briefings Bioinform..

[37]  Atul J. Butte,et al.  A Classifier-based approach to identify genetic similarities between diseases , 2009, Bioinform..

[38]  F. Pagani,et al.  Ubiquitin Proteasome Dysfunction in Human Hypertrophic and Dilated Cardiomyopathies , 2010, Circulation.

[39]  Pierre Baldi,et al.  A Bayesian framework for the analysis of microarray expression data: regularized t -test and statistical inferences of gene changes , 2001, Bioinform..

[40]  Jin-Kao Hao,et al.  Advances in metaheuristics for gene selection and classification of microarray data , 2010, Briefings Bioinform..

[41]  D. Lancet,et al.  GeneCards: integrating information about genes, proteins and diseases. , 1997, Trends in genetics : TIG.

[42]  J. Cai,et al.  miR-346 Regulates Osteogenic Differentiation of Human Bone Marrow-Derived Mesenchymal Stem Cells by Targeting the Wnt/β-Catenin Pathway , 2013, PloS one.

[43]  G. Shi,et al.  Intracellular Delivery Strategies for MicroRNAs and Potential Therapies for Human Cardiovascular Diseases , 2010, Science Signaling.

[44]  Jian Huang,et al.  Penalized feature selection and classification in bioinformatics , 2008, Briefings Bioinform..

[45]  Sanjay Ranka,et al.  Pathway-BasedFeature Selection Algorithm for Cancer Microarray Data , 2010, Adv. Bioinformatics.

[46]  Lin He,et al.  The guardian's little helper: microRNAs in the p53 tumor suppressor network. , 2007, Cancer research.

[47]  M. Kimura,et al.  Inhibitors of enhancer of zeste homolog 2 (EZH2) activate tumor-suppressor microRNAs in human cancer cells , 2014, Oncogenesis.

[48]  Ju Han Kim,et al.  Identifying set-wise differential co-expression in gene expression microarray data , 2009, BMC Bioinformatics.

[49]  Francisco Azuaje,et al.  Computational biology for cardiovascular biomarker discovery , 2009, Briefings Bioinform..

[50]  Michael Watson,et al.  CoXpress: differential co-expression in gene expression data , 2006, BMC Bioinformatics.

[51]  Ujjwal Maulik,et al.  Gene-Expression-Based Cancer Subtypes Prediction Through Feature Selection and Transductive SVM , 2013, IEEE Transactions on Biomedical Engineering.

[52]  Sohail Asghar,et al.  A REVIEW OF FEATURE SELECTION TECHNIQUES IN STRUCTURE LEARNING , 2013 .

[53]  P. Bertolazzi,et al.  Gene expression biomarkers in the brain of a mouse model for Alzheimer's disease: mining of microarray data by logic classification and feature selection. , 2011, Journal of Alzheimer's disease : JAD.