Mining Gene Expression Data using Domain Knowledge

Biology is now an information-intensive science and various research areas, like molecular biology, evolutionary biology or environmental biology, heavily depend on the availability and the ecien t use of information. Data mining, that regroups several techniques for analyzing very large datasets, is used to solve problems in an increasing number of biological applications. This article focuses on the analysis of transcriptome, that reects gene activity in a given cell population at a given time. We describe research themes in transcriptomics related to domain knowledge in biology. We are particularly interested in the way this knowledge can be ecien tly combined and used during the various phases of a data mining process, in the most acknowledged applications in transcriptomics.

[1]  John L. Pfaltz,et al.  Closed Set Mining of Biological Data , 2002, BIOKDD.

[2]  Hsinchun Chen,et al.  Large-scale regulatory network analysis from microarray data: modified Bayesian network learning and association rule mining , 2007, Decis. Support Syst..

[3]  Raymond J. Mooney,et al.  Integrating constraints and metric learning in semi-supervised clustering , 2004, ICML.

[4]  Haidong Wang,et al.  Discovering molecular pathways from protein interaction and gene expression data , 2003, ISMB.

[5]  Roded Sharan,et al.  Discovering statistically significant biclusters in gene expression data , 2002, ISMB.

[6]  Rainer Breitling,et al.  Iterative Group Analysis (iGA): A simple tool to enhance sensitivity and facilitate interpretation of microarray experiments , 2004, BMC Bioinformatics.

[7]  M. Thattai,et al.  Intrinsic noise in gene regulatory networks , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[8]  Ricardo Martínez,et al.  Extracted Knowledge Interpretation in mining biological data: a survey , 2007, RCIS.

[9]  L. Ohno-Machado,et al.  Comparison of hybridization-based and sequencing-based gene expression technologies on biological replicates , 2007, BMC Genomics.

[10]  John F. Roddick,et al.  Association mining , 2006, CSUR.

[11]  Gerhard Tutz,et al.  A CART-based approach to discover emerging patterns in microarray data , 2003, Bioinform..

[12]  Thomas Lengauer,et al.  A new measure for functional similarity of gene products based on Gene Ontology , 2006, BMC Bioinformatics.

[13]  Laurent Brisson,et al.  An Ontology Driven Data Mining Process , 2008, ICEIS.

[14]  Huan Liu,et al.  Subspace clustering for high dimensional data: a review , 2004, SKDD.

[15]  Le Gruenwald,et al.  Microarray gene expression data association rules mining based on JG-Tree , 2003, 14th International Workshop on Database and Expert Systems Applications, 2003. Proceedings..

[16]  Anna Maddalena Pattern Based Management: Data Models and Architectural Aspects , 2004, EDBT Workshops.

[17]  Arlindo L. Oliveira,et al.  Biclustering algorithms for biological data analysis: a survey , 2004, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[18]  Lei Liu,et al.  Subspace clustering for microarray data analysis:multiple criteria and significance assessment , 2004, Proceedings. 2004 IEEE Computational Systems Bioinformatics Conference, 2004. CSB 2004..

[19]  Patrick Meyer,et al.  On selecting interestingness measures for association rules: User oriented description and multiple criteria decision aid , 2008, Eur. J. Oper. Res..

[20]  Lothar Thiele,et al.  A systematic comparison and evaluation of biclustering methods for gene expression data , 2006, Bioinform..

[21]  Dekang Lin,et al.  An Information-Theoretic Definition of Similarity , 1998, ICML.

[22]  M. Rattray,et al.  A comparison of microarray and MPSS technology platforms for expression analysis of Arabidopsis , 2007, BMC Genomics.

[23]  José María Carazo,et al.  BMC Bioinformatics BioMed Central Methodology article Integrated analysis of gene expression by association rules discovery , 2022 .

[24]  Chad Creighton,et al.  Mining gene expression databases for association rules , 2003, Bioinform..

[25]  C. Pasquier Biological data integration using Semantic Web technologies. , 2008, Biochimie.

[26]  Rithy K. Roth,et al.  Gene expression analysis by massively parallel signature sequencing (MPSS) on microbead arrays , 2000, Nature Biotechnology.

[27]  Ronald W. Davis,et al.  Quantitative Monitoring of Gene Expression Patterns with a Complementary DNA Microarray , 1995, Science.

[28]  D. Pe’er,et al.  Module networks: identifying regulatory modules and their condition-specific regulators from gene expression data , 2003, Nature Genetics.

[29]  David W. Conrath,et al.  Semantic Similarity Based on Corpus Statistics and Lexical Taxonomy , 1997, ROCLING/IJCLCLP.

[30]  T. Mcintosh,et al.  High Confidence Rule Mining for Microarray Analysis , 2007, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[31]  Dennis McLeod,et al.  Subspace Clustering of Microarray Data Based on Domain Transformation , 2006, VDMB.

[32]  Ricardo Martínez,et al.  GenMiner: Mining Informative Association Rules from Genomic Data , 2007, 2007 IEEE International Conference on Bioinformatics and Biomedicine (BIBM 2007).

[33]  Klaus R. Dittrich,et al.  Three decades of data integration - All problems solved? , 2004, IFIP Congress Topical Sessions.

[34]  A. Kerlavage,et al.  Complementary DNA sequencing: expressed sequence tags and human genome project , 1991, Science.

[35]  M. Daly,et al.  PGC-1α-responsive genes involved in oxidative phosphorylation are coordinately downregulated in human diabetes , 2003, Nature Genetics.

[36]  Ji Huang,et al.  [Serial analysis of gene expression]. , 2002, Yi chuan = Hereditas.

[37]  T. Werner Bioinformatics applications for pathway analysis of microarray data. , 2008, Current opinion in biotechnology.

[38]  Daniel Hanisch,et al.  Co-clustering of biological networks and gene expression data , 2002, ISMB.

[39]  Carolina Ruiz,et al.  Distance-enhanced association rules for gene expression , 2003, BIOKDD.

[40]  Simon J Davis,et al.  Deep analysis of cellular transcriptomes – LongSAGE versus classic MPSS , 2007, BMC Genomics.

[41]  Musa H. Asyali,et al.  Gene Expression Profile Classification: A Review , 2006 .

[42]  Sang-Ho Lee,et al.  Application of Emerging Patterns for Multi-source Bio-Data Classification and Analysis , 2005, ICNC.

[43]  D. Botstein,et al.  Cluster analysis and display of genome-wide expression patterns. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[44]  Jian Pei,et al.  Mining cross-graph quasi-cliques in gene expression and protein interaction data , 2005, 21st International Conference on Data Engineering (ICDE'05).

[45]  Attila Gyenesei,et al.  Mining co-regulated gene profiles for the detection of functional associations in gene expression data , 2007, Bioinform..

[46]  Elisa Bertino,et al.  Towards a Logical Model for Patterns , 2003, ER.

[47]  Anthony K. H. Tung,et al.  Carpenter: finding closed patterns in long biological datasets , 2003, KDD '03.

[48]  Lei Liu,et al.  Subspace clustering for microarray data analysis:multiple criteria and significance assessment , 2004 .

[49]  Anthony K. H. Tung,et al.  Mining top-K covering rule groups for gene expression data , 2005, SIGMOD '05.

[50]  Philip Resnik,et al.  Using Information Content to Evaluate Semantic Similarity in a Taxonomy , 1995, IJCAI.

[51]  Christiane Fellbaum,et al.  Combining Local Context and Wordnet Similarity for Word Sense Identification , 1998 .

[52]  Huiqing Liu,et al.  Discovery of significant rules for classifying cancer diagnosis data , 2003, ECCB.

[53]  Wynne Hsu,et al.  Finding Interesting Patterns Using User Expectations , 1999, IEEE Trans. Knowl. Data Eng..

[54]  Carole A. Goble,et al.  State of the nation in data integration for bioinformatics , 2008, J. Biomed. Informatics.

[55]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[56]  Yong Yu,et al.  Conceptual Graph Matching for Semantic Search , 2002, ICCS.

[57]  C. Becquet,et al.  Strong-association-rule mining for large-scale gene-expression data analysis: a case study on human SAGE data , 2002, Genome Biology.

[58]  Björn Olsson,et al.  Using functional annotation to improve clusterings of gene expression patterns , 2002, Inf. Sci..

[59]  Carole A. Goble,et al.  Semantic Similarity Measures as Tools for Exploring the Gene Ontology , 2002, Pacific Symposium on Biocomputing.

[60]  Jason E. Stewart,et al.  Minimum information about a microarray experiment (MIAME)—toward standards for microarray data , 2001, Nature Genetics.

[61]  Kei-Hoi Cheung,et al.  Advancing translational research with the Semantic Web , 2007, BMC Bioinformatics.

[62]  Engelbert Mephu Nguifo,et al.  Frequent closed itemset based algorithms: a thorough structural and analytical survey , 2006, SKDD.

[63]  Ricardo Martínez,et al.  Co-expressed gene groups analysis (CGGA): An automatic tool for the interpretation of microarray experiments , 2006 .

[64]  Zhaohong Deng,et al.  Clustering Analysis of Gene Expression Data based on Semi-supervised Visual Clustering Algorithm , 2006, Soft Comput..

[65]  Jinyan Li,et al.  Identifying good diagnostic gene groups from gene expression profiles using the concept of emerging patterns , 2002, Bioinform..

[66]  Abraham Silberschatz,et al.  What Makes Patterns Interesting in Knowledge Discovery Systems , 1996, IEEE Trans. Knowl. Data Eng..

[67]  G. A. Whitmore,et al.  Importance of replication in microarray gene expression studies: statistical methods and evidence from repetitive cDNA hybridizations. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[68]  Jian Pei,et al.  Mining gene–sample–time microarray data: a coherent gene cluster discovery approach , 2007, Knowledge and Information Systems.

[69]  Gediminas Adomavicius,et al.  Handling very large numbers of association rules in the analysis of microarray data , 2002, KDD.

[70]  Seon-Young Kim,et al.  PAGE: Parametric Analysis of Gene Set Enrichment , 2005, BMC Bioinform..