Mining Concepts from Large SAGE Gene Expression Matrices

One of the crucial needs in post-genomic research is to an- alyze expression matrices (e.g., SAGE and microarray data) to identify a priori interesting sets of genes, e.g., sets of genes that are frequently co-regulated. Such matrices provide expression values for given biological situations (the lines) and given genes (columns). The inductive database framework enables to support knowledge discovery processes by means of sequences of queries that concerns both data processing and pattern querying (extraction, post-processing). We provide a simple formaliza- tion of a relevant pattern domain (language of patterns, evaluation func- tions and primitive constraints) that has been proved useful for specify- ing various analysis tasks. Recent algorithmic results w.r.t. the ecient evaluation (constraint-based mining) of the so-called inductive queries are emphasized and illustrated on a 90 £ 12 636 human SAGE expres- sion matrix.

[1]  Jörg Sander,et al.  Hierarchical cluster analysis of SAGE data for cancer profiling , 2001, BIOKDD.

[2]  Jean-François Boulicaut,et al.  Optimization of association rule mining queries , 2002, Intell. Data Anal..

[3]  Artur Bykowski Condensed representations of frequent sets : application to descriptive pattern discovery , 2002 .

[4]  Laks V. S. Lakshmanan,et al.  Exploratory mining and pruning optimizations of constrained associations rules , 1998, SIGMOD '98.

[5]  Nicolas Pasquier,et al.  Efficient Mining of Association Rules Using Closed Itemset Lattices , 1999, Inf. Syst..

[6]  Jean-François Boulicaut,et al.  Using transposition for pattern discovery from microarray data , 2003, DMKD '03.

[7]  Jian Pei,et al.  CLOSET: An Efficient Algorithm for Mining Frequent Closed Itemsets , 2000, ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery.

[8]  C. Becquet,et al.  Strong-association-rule mining for large-scale gene-expression data analysis: a case study on human SAGE data , 2002, Genome Biology.

[9]  P. Brown,et al.  Exploring the metabolic and genetic control of gene expression on a genomic scale. , 1997, Science.

[10]  Jean-François Boulicaut,et al.  Extraction de régularités dans des données d'expression SAGE humaines , 2003 .

[11]  Jean-François Boulicaut,et al.  Inductive Databases and Multiple Uses of Frequent Itemsets: The cInQ Approach , 2004, Database Support for Data Mining Applications.

[12]  Jean-François Boulicaut,et al.  Modeling KDD Processes within the Inductive Database Framework , 1999, DaWaK.

[13]  D. Botstein,et al.  Cluster analysis and display of genome-wide expression patterns. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[14]  Heikki Mannila,et al.  Levelwise Search and Borders of Theories in Knowledge Discovery , 1997, Data Mining and Knowledge Discovery.

[15]  Ji Huang,et al.  [Serial analysis of gene expression]. , 2002, Yi chuan = Hereditas.

[16]  C. Niehrs,et al.  Synexpression groups in eukaryotes , 1999, Nature.

[17]  Luc De Raedt,et al.  A perspective on inductive databases , 2002, SKDD.

[18]  Heikki Mannila,et al.  A database perspective on knowledge discovery , 1996, CACM.

[19]  Toon Calders,et al.  Mining All Non-derivable Frequent Itemsets , 2002, PKDD.

[20]  Gerd Stumme,et al.  Mining frequent patterns with counting inference , 2000, SKDD.

[21]  Daniel Kifer,et al.  DualMiner: A Dual-Pruning Algorithm for Itemsets with Constraints , 2002, Data Mining and Knowledge Discovery.

[22]  Luc De Raedt,et al.  The Levelwise Version Space Algorithm and its Application to Molecular Fragment Finding , 2001, IJCAI.

[23]  Mohammed J. Zaki Generating non-redundant association rules , 2000, KDD '00.