Rival penalized competitive learning (RPCL): a topology-determining algorithm for analyzing gene expression data

DNA arrays have become the immediate choice in the analysis of large-scale expression measurements. Understanding the expression pattern of genes provide functional information on newly identified genes by computational approaches. Gene expression pattern is an indicator of the state of the cell, and abnormal cellular states can be inferred by comparing expression profiles. Since co-regulated genes, and genes involved in a particular pathway, tend to show similar expression patterns, clustering expression patterns has become the natural method of choice to differentiate groups. However, most methods based on cluster analysis suffer from the usual problems (i) dead units, and (ii) the problem of determining the correct number of clusters (k) needed to classify the data. Selecting the k has been an open problem of pattern recognition and statistics for decades. Since clustering reveals similar patterns present in the data, fixing this number strongly influences the quality of the result. While there is no theoretical solution to this problem, the number of clusters can be decided by a heuristic clustering algorithm called rival penalized competitive learning (RPCL). We present a novel implementation of RPCL that transforms the correct number of clusters problem to the tractable problem of clustering based on the degree of similarity. This is biologically significant since our implementation clusters functionally co-regulated genes and genes that present similar patterns of expression. This new approach reveals potential genes that are co-involved in a biological process. This implementation of the RPCL algorithm is useful in differentiating groups involved in concerted functional regulation and helps to progressively home into patterns, which are closely similar.

[1]  Michael Ruogu Zhang,et al.  Comprehensive identification of cell cycle-regulated genes of the yeast Saccharomyces cerevisiae by microarray hybridization. , 1998, Molecular biology of the cell.

[2]  V. Iyer,et al.  Genomics and array technology. , 1999, Current opinion in oncology.

[3]  Teuvo Kohonen,et al.  Emergence of invariant-feature detectors in the adaptive-subspace self-organizing map , 1996, Biological Cybernetics.

[4]  D. Botstein,et al.  Cluster analysis and display of genome-wide expression patterns. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[5]  Karl Rihaczek,et al.  1. WHAT IS DATA MINING? , 2019, Data Mining for the Social Sciences.

[6]  S. P. Fodor,et al.  High density synthetic oligonucleotide arrays , 1999, Nature Genetics.

[7]  D Thieffry,et al.  From global expression data to gene networks. , 1999, BioEssays : news and reviews in molecular, cellular and developmental biology.

[8]  Anil K. Jain,et al.  Algorithms for Clustering Data , 1988 .

[9]  J. Mesirov,et al.  Interpreting patterns of gene expression with self-organizing maps: methods and application to hematopoietic differentiation. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[10]  Teuvo Kohonen,et al.  The self-organizing map , 1990 .

[11]  Igor Jurisica,et al.  Binary tree-structured vector quantization approach to clustering and visualizing microarray data , 2002, ISMB.

[12]  D. Lockhart,et al.  Expression monitoring by hybridization to high-density oligonucleotide arrays , 1996, Nature Biotechnology.

[13]  P. Törönen,et al.  Analysis of gene expression data using self‐organizing maps , 1999, FEBS letters.

[14]  Graham J. Williams,et al.  Data Mining , 2000, Communications in Computer and Information Science.

[15]  Stanley C. Ahalt,et al.  Competitive learning algorithms for vector quantization , 1990, Neural Networks.

[16]  Ali S. Hadi,et al.  Finding Groups in Data: An Introduction to Chster Analysis , 1991 .

[17]  F. Bertucci,et al.  Expression profiling: DNA arrays in many guises. , 1999, BioEssays : news and reviews in molecular, cellular and developmental biology.

[18]  R. O. Stuart,et al.  Changes in global gene expression patterns during development and maturation of the rat kidney , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[19]  Sheng Chen,et al.  Recursive hybrid algorithm for non-linear system identification using radial basis function networks , 1992 .

[20]  P. Brown,et al.  Exploring the metabolic and genetic control of gene expression on a genomic scale. , 1997, Science.

[21]  B. Morgan,et al.  Non-uniqueness and Inversions in Cluster Analysis , 1995 .

[22]  Andrew J. Holloway,et al.  Options available—from start to finish—for obtaining data from DNA microarrays II , 2002, Nature Genetics.

[23]  Teuvo Kohonen,et al.  Self-organized formation of topologically correct feature maps , 2004, Biological Cybernetics.

[24]  Duane DeSieno,et al.  Adding a conscience to competitive learning , 1988, IEEE 1988 International Conference on Neural Networks.

[25]  K. Furge,et al.  Gene expression profiling of clear cell renal cell carcinoma: Gene identification and prognostic classification , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[26]  Erkki Oja,et al.  Rival penalized competitive learning for clustering analysis, RBF net, and curve detection , 1993, IEEE Trans. Neural Networks.

[27]  R. Tibshirani,et al.  Significance analysis of microarrays applied to the ionizing radiation response , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[28]  T. Kohonen Self-organized formation of topographically correct feature maps , 1982 .

[29]  Ronald W. Davis,et al.  A genome-wide transcriptional analysis of the mitotic cell cycle. , 1998, Molecular cell.