Approaches toanalyse and interpret biological profile data

Advances in biotechnologies rapidly increase the number of molecules of a cell which can be observed simultaneously. This includes expression levels of thousands or ten-thousands of genes as well as concentration levels of metabolites or proteins. Such Profile data, observed at different times or at different experimental conditions (e.g., heat or dry stress), show how the biological experiment is reflected on the molecular level. This information is helpful to understand the molecular behaviour and to identify molecules or combination of molecules that characterise specific biological condition (e.g., disease). This work shows the potentials of component extraction algorithms to identify the major factors which influenced the observed data. This can be the expected experimental factors such as the time or temperature as well as unexpected factors such as technical artefacts or even unknown biological behaviour. Extracting components means to reduce the very high-dimensional data to a small set of new variables termed components. Each component is a combination of all original variables. The classical approach for that purpose is the principal component analysis (PCA). It is shown that, in contrast to PCA which maximises the variance only, modern approaches such as independent component analysis (ICA) are more suitable for analysing molecular data. The condition of independence between components of ICA fits more naturally our assumption of individual (independent) factors which influence the data. This higher potential of ICA is demonstrated by a crossing experiment of the model plant Arabidopsis thaliana (Thale Cress). The experimental factors could be well identified and, in addition, ICA could even detect a technical artefact. However, in continuously observations such as in time experiments, the data show, in general, a nonlinear distribution. To analyse such nonlinear data, a nonlinear extension of PCA is used. This nonlinear PCA (NLPCA) is based on a neural network algorithm. The algorithm is adapted to be applicable to incomplete molecular data sets. Thus, it provides also the ability to estimate the missing data. The potential of nonlinear PCA to identify nonlinear factors is demonstrated by a cold stress experiment of Arabidopsis thaliana. The results of component analysis can be used to build a molecular network model. Since it includes functional dependencies it is termed functional network. Applied to the cold stress data, it is shown that functional networks are appropriate to visualise biological processes and thereby reveals molecular dynamics.%%%%Fortschritte in der Biotechnologie ermoglichen es, eine immer grosere Anzahl von Molekulen in einer Zelle gleichzeitig zu erfassen. Das betrifft sowohl die Expressionswerte tausender oder zehntausender Gene als auch die Konzentrationswerte von Metaboliten oder Proteinen. Diese Profildaten verschiedener Zeitpunkte oder unterschiedlicher experimenteller Bedingungen (z.B. unter Stressbedingungen wie Hitze oder Trockenheit) zeigen, wie sich das biologische…

[1]  Homin K. Lee,et al.  Coexpression analysis of human genes across many microarray data sets. , 2004, Genome research.

[2]  P. Erdos,et al.  On the evolution of random graphs , 1984 .

[3]  O. Fiehn,et al.  Differential metabolic networks unravel the effects of silent plant phenotypes. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[4]  R. Miranda,et al.  Circular Nodes in Neural Networks , 1996, Neural Computation.

[5]  Gunnar Rätsch,et al.  Kernel PCA and De-Noising in Feature Spaces , 1998, NIPS.

[6]  S. Batzoglou,et al.  Application of independent component analysis to microarrays , 2003, Genome Biology.

[7]  Charles M. Bishop Variational principal components , 1999 .

[8]  Joachim Selbig,et al.  Visualization and analysis of molecular data. , 2007, Methods in molecular biology.

[9]  Michael L. Littman,et al.  XGvis: Interactive Data Visualization with Multidimensional Scaling , 1998 .

[10]  T. Sejnowski,et al.  Dynamic Brain Sources of Visual Evoked Responses , 2002, Science.

[11]  Richard Bellman,et al.  Adaptive Control Processes: A Guided Tour , 1961, The Mathematical Gazette.

[12]  M. Kramer Nonlinear principal component analysis using autoassociative neural networks , 1991 .

[13]  L. Freeman Centrality in social networks conceptual clarification , 1978 .

[14]  Joachim Selbig,et al.  Non-linear PCA: a missing data approach , 2005, Bioinform..

[15]  Nicole A. Lazar,et al.  Statistical Analysis With Missing Data , 2003, Technometrics.

[16]  Joshua M. Stuart,et al.  A Gene-Coexpression Network for Global Discovery of Conserved Genetic Modules , 2003, Science.

[17]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[18]  Antti Honkela,et al.  Unsupervised Variational Bayesian Learning of Nonlinear Models , 2004, NIPS.

[19]  Joachim Selbig,et al.  Independent components analysis of starch deficient pgm mutants , 2004, German Conference on Bioinformatics.

[20]  David P. Kreil,et al.  Independent component analysis of microarray data in the study of endometrial cancer , 2004, Oncogene.

[21]  H. Sebastian Seung,et al.  Learning Generative Models with the Up-Propagation Algorithm , 1997, NIPS.

[22]  A. J. Bell,et al.  INDEPENDENT COMPONENT ANALYSIS OF BIOMEDICAL SIGNALS , 2000 .

[23]  R Hecht-Nielsen,et al.  Replicator neural networks for universal optimal source coding. , 1995, Science.

[24]  Erkki Oja,et al.  Independent component approach to the analysis of EEG and MEG recordings , 2000, IEEE Transactions on Biomedical Engineering.

[25]  R. Goodacre,et al.  Chemometric discrimination of unfractionated plant extracts analyzed by electrospray mass spectrometry. , 2003, Phytochemistry.

[26]  Charles L. Guy,et al.  Exploring the Temperature-Stress Metabolome of Arabidopsis1[w] , 2004, Plant Physiology.

[27]  Lawrence K. Saul,et al.  Think Globally, Fit Locally: Unsupervised Learning of Low Dimensional Manifold , 2003, J. Mach. Learn. Res..

[28]  E. C. Malthouse,et al.  Limitations of nonlinear PCA as performed with generic neural networks , 1998, IEEE Trans. Neural Networks.

[29]  Albert,et al.  Emergence of scaling in random networks , 1999, Science.

[30]  M. Gerstein,et al.  Genomic analysis of regulatory network dynamics reveals large topological changes , 2004, Nature.

[31]  John Quackenbush Microarray data normalization and transformation , 2002, Nature Genetics.

[32]  Satosi Watanabe,et al.  Pattern Recognition: Human and Mechanical , 1985 .

[33]  Heekuck Oh,et al.  Neural Networks for Pattern Recognition , 1993, Adv. Comput..

[34]  Bernhard Schölkopf,et al.  Nonlinear Component Analysis as a Kernel Eigenvalue Problem , 1998, Neural Computation.

[35]  Carsten O. Daub,et al.  The mutual information: Detecting and evaluating dependencies between variables , 2002, ECCB.

[36]  Jan Ihmels,et al.  Principles of transcriptional control in the metabolic network of Saccharomyces cerevisiae , 2004, Nature Biotechnology.

[37]  Ricardo Vigário,et al.  Nonlinear PCA: a new hierarchical approach , 2002, ESANN.

[38]  W. Weckwerth Metabolomics in systems biology. , 2003, Annual review of plant biology.

[39]  David J. C. MacKay,et al.  A decomposition model to track gene expression signatures: preview on observer-independent classification of ovarian cancer , 2002, Bioinform..

[40]  William W. Hsieh,et al.  Nonlinear multivariate and time series analysis by neural network methods , 2004 .

[41]  Luis Mateus Rocha,et al.  Singular value decomposition and principal component analysis , 2003 .

[42]  D. Botstein,et al.  Singular value decomposition for genome-wide expression data processing and modeling. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[43]  Adam H. Monahan,et al.  The Vertical Structure of Wintertime Climate Regimes of the Northern Hemisphere Extratropical Atmosphere , 2003 .

[44]  Motoaki Kawanabe,et al.  Kernel-Based Nonlinear Blind Source Separation , 2003, Neural Computation.

[45]  Li Liu,et al.  Robust singular value decomposition analysis of microarray data , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[46]  J. Karhunen,et al.  Advances in Nonlinear Blind Source Separation , 2003 .

[47]  Christopher J. C. Burges,et al.  Geometric Methods for Feature Extraction and Dimensional Reduction , 2005 .

[48]  Albert-László Barabási,et al.  Error and attack tolerance of complex networks , 2000, Nature.

[49]  Shin Ishii,et al.  A Bayesian missing value estimation method for gene expression profile data , 2003, Bioinform..

[50]  M. Kenward,et al.  An Introduction to the Bootstrap , 2007 .

[51]  Kurt Hornik,et al.  Learning in linear neural networks: a survey , 1995, IEEE Trans. Neural Networks.

[52]  Michael I. Jordan,et al.  Learning from Incomplete Data , 1994 .

[53]  J. Mesirov,et al.  Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. , 1999, Science.

[54]  Russ B. Altman,et al.  Missing value estimation methods for DNA microarrays , 2001, Bioinform..

[55]  R. Fisher THE USE OF MULTIPLE MEASUREMENTS IN TAXONOMIC PROBLEMS , 1936 .

[56]  S T Roweis,et al.  Nonlinear dimensionality reduction by locally linear embedding. , 2000, Science.

[57]  James V. Stone Independent component analysis: an introduction , 2002, Trends in Cognitive Sciences.

[58]  Michael I. Jordan,et al.  Kernel independent component analysis , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[59]  Pierre Comon,et al.  Independent component analysis, A new concept? , 1994, Signal Process..

[60]  S. Hyakin,et al.  Neural Networks: A Comprehensive Foundation , 1994 .

[61]  F. Carrari,et al.  Zooming In on a Quantitative Trait for Tomato Yield Using Interspecific Introgressions , 2004, Science.

[62]  R. Albert,et al.  The large-scale organization of metabolic networks , 2000, Nature.

[63]  Klaus-Robert Müller,et al.  Injecting noise for analysing the stability of ICA components , 2004, Signal Process..

[64]  Edward M. Reingold,et al.  Graph drawing by force‐directed placement , 1991, Softw. Pract. Exp..

[65]  C L Webber,et al.  Dynamical assessment of physiological systems and states using recurrence plot strategies. , 1994, Journal of applied physiology.

[66]  Erkki Oja,et al.  Independent component analysis: algorithms and applications , 2000, Neural Networks.

[67]  Laurenz Wiskott,et al.  CuBICA: independent component analysis by simultaneous third- and fourth-order cumulant diagonalization , 2004, IEEE Transactions on Signal Processing.

[68]  Michel Verleysen,et al.  Nonlinear Dimensionality Reduction , 2021, Computer Vision.

[69]  M. Hestenes,et al.  Methods of conjugate gradients for solving linear systems , 1952 .

[70]  Geoffrey E. Hinton Learning Translation Invariant Recognition in Massively Parallel Networks , 1987, PARLE.

[71]  Jakob Verbeek,et al.  Procrustes Analysis to Coordinate Mixtures of Probabilistic Principal Component Analyzers , 2002 .

[72]  Motoaki Kawanabe,et al.  A resampling approach to estimate the stability of one-dimensional or multidimensional independent components , 2002, IEEE Transactions on Biomedical Engineering.

[73]  R. Goodacre,et al.  Metabolic fingerprinting of salt-stressed tomatoes. , 2003, Phytochemistry.

[74]  F. A. Seiler,et al.  Numerical Recipes in C: The Art of Scientific Computing , 1989 .

[75]  C. J. Stone,et al.  Optimal Rates of Convergence for Nonparametric Estimators , 1980 .

[76]  Lothar Krempel,et al.  Visualisierung komplexer Strukturen - Grundlagen der Darstellung mehrdimensionaler Netzwerke , 2005 .

[77]  Lan V. Zhang,et al.  Evidence for dynamically organized modularity in the yeast protein–protein interaction network , 2004, Nature.

[78]  Andrzej Cichocki,et al.  Adaptive Blind Signal and Image Processing - Learning Algorithms and Applications , 2002 .

[79]  Andreas Ziehe,et al.  TDSEP { an e(cid:14)cient algorithm for blind separation using time structure , 1998 .

[80]  Wolfram Liebermeister,et al.  Linear modes of gene expression determined by independent component analysis , 2002, Bioinform..

[81]  James V. Stone Independent Component Analysis: A Tutorial Introduction , 2007 .

[82]  Karen J. Reynolds,et al.  Principal components of recurrence quantification analysis of EMG , 2001, 2001 Conference Proceedings of the 23rd Annual International Conference of the IEEE Engineering in Medicine and Biology Society.

[83]  Terrence J. Sejnowski,et al.  An Information-Maximization Approach to Blind Separation and Blind Deconvolution , 1995, Neural Computation.

[84]  A. Barabasi,et al.  Network biology: understanding the cell's functional organization , 2004, Nature Reviews Genetics.

[85]  Alberto de la Fuente,et al.  Discovery of meaningful associations in genomic data using partial correlation coefficients , 2004, Bioinform..

[86]  M. V. Velzen,et al.  Self-organizing maps , 2007 .

[87]  Neal S. Holter,et al.  Fundamental patterns underlying gene expression profiles: simplicity from complexity. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[88]  Joachim Selbig,et al.  Metabolite fingerprinting: detecting biological features by independent component analysis , 2004, Bioinform..

[89]  Barak A. Pearlmutter,et al.  Independent Components of Magnetoencephalography: Localization , 2002, Neural Computation.

[90]  Klaus Obermayer,et al.  Feature Selection and Classification on Matrix Data: From Large Margins to Small Covering Numbers , 2002, NIPS.