Data visualization methodologies for data mining systems in bioinformatics

Bioinformatics systems benefit from the use of data mining strategies to locate interesting and pertinent relationships within massive information. For example, data mining methods can ascertain and summarize the set of genes responding to a certain level of stress in an organism. Even a cursory glance through the literature in journals, reveals the persistent role of data mining in experimental biology. Integrating data mining within the context of experimental investigations is central to bioinformatics software. In this paper we describe the framework of probabilistic principal surfaces, a latent variable model which offers a large variety of appealing visualization capabilities and which can be successfully integrated in the context of microarray analysis. A preprocessing phase consisting of a nonlinear PCA neural network which seems to be very useful to deal with noisy and time dependent nature of microarray data has been added to this framework.

[1]  Peter Tiño,et al.  Hierarchical GTM: Constructing Localized Nonlinear Projection Manifolds in a Principled Way , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[2]  Christopher M. Bishop,et al.  A Hierarchical Latent Variable Model for Data Visualization , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[3]  L. Milano,et al.  A multifrequency analysis of radio variability of blazars , 2004, astro-ph/0401501.

[4]  Juha Karhunen,et al.  Representation and separation of signals using nonlinear PCA type learning , 1994, Neural Networks.

[5]  Juha Karhunen,et al.  Generalizations of principal component analysis, optimization problems, and neural networks , 1995, Neural Networks.

[6]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[7]  David G. Stork,et al.  Pattern Classification , 1973 .

[8]  Antonino Staiano,et al.  Probabilistic principal surfaces for yeast gene microarray data mining , 2004, Fourth IEEE International Conference on Data Mining (ICDM'04).

[9]  L. Milano,et al.  Spectral analysis of stellar light curves by means of neural networks , 1999 .

[10]  D. Botstein,et al.  Cluster analysis and display of genome-wide expression patterns. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[11]  Joydeep Ghosh,et al.  A Unified Model for Probabilistic Principal Surfaces , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[12]  Michael Ruogu Zhang,et al.  Comprehensive identification of cell cycle-regulated genes of the yeast Saccharomyces cerevisiae by microarray hybridization. , 1998, Molecular biology of the cell.

[13]  Erkki Oja,et al.  Principal and Independent Components in Neural Networks - Recent Developments , 1995 .