Reducing the Complexity of Complex Gene Coexpression Networks by Coupling Multiweighted Labeling with Topological Analysis

Undirected gene coexpression networks obtained from experimental expression data coupled with efficient computational procedures are increasingly used to identify potentially relevant biological information (e.g., biomarkers) for a particular disease. However, coexpression networks built from experimental expression data are in general large highly connected networks with an elevated number of false-positive interactions (nodes and edges). In order to infer relevant information, the network must be properly filtered and its complexity reduced. Given the complexity and the multivariate nature of the information contained in the network, this requires the development and application of efficient feature selection algorithms to be able to exploit the topological characteristics of the network to identify relevant nodes and edges. This paper proposes an efficient multivariate filtering designed to analyze the topological properties of a coexpression network in order to identify potential relevant genes for a given disease. The algorithm has been tested on three datasets for three well known and studied diseases: acute myeloid leukemia, breast cancer, and diffuse large B-cell lymphoma. Results have been validated resorting to bibliographic data automatically mined using the ProteinQuest literature mining tool.

[1]  Huan Liu,et al.  Feature Selection for High-Dimensional Data: A Fast Correlation-Based Filter Solution , 2003, ICML.

[2]  Natasa Przulj,et al.  Integrative network alignment reveals large regions of global network similarity in yeast and human , 2011, Bioinform..

[3]  Pedro Larrañaga,et al.  Filter versus wrapper gene selection approaches in DNA microarray domains , 2004, Artif. Intell. Medicine.

[4]  Shigeo Abe DrEng Pattern Classification , 2001, Springer London.

[5]  David G. Stork,et al.  Pattern Classification , 1973 .

[6]  Christian A. Rees,et al.  Microarray analysis reveals a major direct role of DNA copy number alteration in the transcriptional program of human breast tumors , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[7]  Arun Siddharth Konagurthu,et al.  On the origin of distribution patterns of motifs in biological networks , 2008, BMC Systems Biology.

[8]  Alfredo Benso,et al.  A graph-based representation of Gene Expression profiles in DNA microarrays , 2008, 2008 IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology.

[9]  Xuesong Lu,et al.  Significance of Gene Ranking for Classification of Microarray Samples , 2006, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[10]  Ash A. Alizadeh,et al.  Cell-type specific gene expression profiles of leukocytes in human peripheral blood , 2006, BMC Genomics.

[11]  Robert Gentleman,et al.  Graphs in molecular biology , 2007, BMC Bioinformatics.

[12]  Uri Alon,et al.  The incoherent feed-forward loop can generate non-monotonic input functions for genes , 2008, Molecular systems biology.

[13]  Pedro Larrañaga,et al.  A review of feature selection techniques in bioinformatics , 2007, Bioinform..

[14]  Wei Pan,et al.  A comparative review of statistical methods for discovering differentially expressed genes in replicated microarray experiments , 2002, Bioinform..

[15]  Xiao-Peng Zhang,et al.  Interlinking positive and negative feedback loops creates a tunable motif in gene regulatory networks. , 2009, Physical review. E, Statistical, nonlinear, and soft matter physics.

[16]  A. Barabasi,et al.  Network biology: understanding the cell's functional organization , 2004, Nature Reviews Genetics.

[17]  Alfredo Benso,et al.  Differential gene expression graphs: A data structure for classification in DNA microarrays , 2008, 2008 8th IEEE International Conference on BioInformatics and BioEngineering.

[18]  Fabio Fioravanti,et al.  Modeling gene regulatory network motifs using statecharts , 2012, BMC Bioinformatics.

[19]  Lawrence Hunter,et al.  Improving protein function prediction methods with integrated literature data , 2008, BMC Bioinformatics.

[20]  E. Marcotte,et al.  Rational association of genes with traits using a genome-scale gene network for Arabidopsis thaliana , 2010, Nature Biotechnology.

[21]  R. Tibshirani,et al.  Use of gene-expression profiling to identify prognostic subclasses in adult acute myeloid leukemia. , 2004, The New England journal of medicine.

[22]  H. Hirt,et al.  Protein networking: insights into global functional organization of proteomes , 2008, Proteomics.

[23]  Jie Li,et al.  The architecture of the gene regulatory networks of different tissues , 2012, Bioinform..

[24]  Jean-Daniel Zucker,et al.  Interactional and functional centrality in transcriptional co-expression networks , 2010, Bioinform..

[25]  Chrystopher L. Nehaniv,et al.  Do motifs reflect evolved function? - No convergent evolution of genetic regulatory network subgraph topologies , 2008, Biosyst..

[26]  K. Becker,et al.  Analysis of microarray data using Z score transformation. , 2003, The Journal of molecular diagnostics : JMD.

[27]  Ash A. Alizadeh,et al.  Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling , 2000, Nature.

[28]  Aixia Guo,et al.  Gene Selection for Cancer Classification using Support Vector Machines , 2014 .

[29]  Massimo Natale,et al.  Gene expression profiling of HGF/Met activation in neonatal mouse heart , 2012, Transgenic Research.

[30]  A. Levine,et al.  Surfing the p53 network , 2000, Nature.

[31]  Huan Liu,et al.  Efficient Feature Selection via Analysis of Relevance and Redundancy , 2004, J. Mach. Learn. Res..

[32]  Xinping Cui,et al.  Optimized Ranking and Selection Methods for Feature Selection with Application in Microarray Experiments , 2010, Journal of biopharmaceutical statistics.

[33]  Dario Bonino,et al.  A meta-analysis of two-dimensional electrophoresis pattern of the Parkinson's disease-related protein DJ-1 , 2010, Bioinform..

[34]  Javier Macía,et al.  Specialized or flexible feed-forward loop motifs: a question of topology , 2009, BMC Systems Biology.

[35]  Anthony K. H. Tung,et al.  Mining top-K covering rule groups for gene expression data , 2005, SIGMOD '05.

[36]  Gordon B Mills,et al.  Network topology determines dynamics of the mammalian MAPK1,2 signaling network: bifan motif regulation of C‐Raf and B‐Raf isoforms by FGFR and MC1R , 2008, FASEB journal : official publication of the Federation of American Societies for Experimental Biology.

[37]  Alfredo Benso,et al.  A cDNA Microarray Gene Expression Data Classifier for Clinical Diagnostics Based on Graph Theory , 2011, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[38]  Gary A. Churchill,et al.  Analysis of Variance for Gene Expression Microarray Data , 2000, J. Comput. Biol..

[39]  Robert Clarke,et al.  Identifying protein interaction subnetworks by a bagging Markov random field-based method , 2012, Nucleic acids research.

[40]  Hiroshi Mamitsuka,et al.  Selecting features in microarray classification using ROC curves , 2006, Pattern Recognit..