Identifying good diagnostic gene groups from gene expression profiles using the concept of emerging patterns

MOTIVATIONS AND RESULTS Gene groups that are significantly related to a disease can be detected by conducting a series of gene expression experiments. This work is aimed at discovering special types of gene groups that satisfy the following property. In each group, its member genes are found to be one-to-one contained in pre-determined intervals of gene expression level with a large frequency in one class of cells but are never found unanimously in these intervals in the other class of cells. We call these gene groups emerging patterns, to emphasize the patterns' frequency changes between two classes of cells. We use effective discretization and gene selection methods to obtain the most discriminatory genes. We also use efficient algorithms to derive the patterns from these genes. According to our studies on the ALL/AML dataset and the colon tumor dataset, some patterns, which consist of one or more genes, can reach a high frequency of 90%, or even 100%. In other words, they nearly or fully dominate one class of cells, even though they rarely occur in the other class. The discovered patterns are used to classify new cells with a higher accuracy than other reported methods. Based on these patterns, we also conjecture the possibility of a personalized treatment plan which converts colon tumor cells into normal cells by modulating the expression levels of a few genes.

[1]  Ronald W. Davis,et al.  Quantitative Monitoring of Gene Expression Patterns with a Complementary DNA Microarray , 1995, Science.

[2]  Ji Huang,et al.  [Serial analysis of gene expression]. , 2002, Yi chuan = Hereditas.

[3]  Jinyan Li,et al.  Interestingness of Discovered Association Rules in Terms of Neighborhood-Based Unexpectedness , 1998, PAKDD.

[4]  T. Gingeras,et al.  Cellular gene expression altered by human cytomegalovirus: global monitoring with oligonucleotide arrays. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[5]  Nils J. Nilsson,et al.  MLC++, A Machine Learning Library in C++. , 1995 .

[6]  T. Hughes,et al.  Signaling and circuitry of multiple MAPK pathways revealed by a matrix of global gene expression profiles. , 2000, Science.

[7]  J. Barker,et al.  Large-scale temporal gene expression mapping of central nervous system development. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[8]  D. Botstein,et al.  The transcriptional program of sporulation in budding yeast. , 1998, Science.

[9]  U. Alon,et al.  Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[10]  Ron Kohavi,et al.  Supervised and Unsupervised Discretization of Continuous Features , 1995, ICML.

[11]  P. Brown,et al.  Exploring the metabolic and genetic control of gene expression on a genomic scale. , 1997, Science.

[12]  Usama M. Fayyad,et al.  Multi-Interval Discretization of Continuous-Valued Attributes for Classification Learning , 1993, IJCAI.

[13]  D. Lockhart,et al.  Expression monitoring by hybridization to high-density oligonucleotide arrays , 1996, Nature Biotechnology.

[14]  L. Hood,et al.  Monitoring gene expression profile changes in ovarian carcinomas using cDNA microarray. , 1999, Gene.

[15]  Jinyan Li,et al.  Efficient mining of emerging patterns: discovering trends and differences , 1999, KDD '99.

[16]  Christian A. Rees,et al.  Distinctive gene expression patterns in human mammary epithelial cells and breast cancers. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[17]  J. Mesirov,et al.  Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. , 1999, Science.

[18]  Kotagiri Ramamohanarao,et al.  The Space of Jumping Emerging Patterns and Its Incremental Maintenance Algorithms , 2000, ICML.

[19]  Nello Cristianini,et al.  Support vector machine classification and validation of cancer tissue samples using microarray expression data , 2000, Bioinform..