Information Visualization Techniques for Metabolic Engineering

The main purpose of metabolic engineering is the modification of biological systems towards specific goals using genetic manipulations. For this purpose, models are built that describe the stationary and dynamic behaviour of biochemical reaction networks inside a biological cell. Based on these models, simulations are carried out with the intention to understand the cell’s behaviour. The modeling process leads to the generation of large amounts of data, both during the modeling itself and after the simulation of the created models. The manual interpretation is almost impossible; consequently, appropriate techniques for supporting the analysis and visualization of these data are needed. The purpose of this thesis is to investigate visualization and data mining techniques to support the metabolic modeling process. The work presented in this thesis is divided into several tracks: • Visualization of metabolic networks and the associated simulation data. Novel visualization techniques will be presented, which allow the visual exploration of metabolic network dynamics, beyond static snapshots of the simulated data plots. Node-link representations of the metabolic network are animated using the time series of metabolite concentrations and reaction rates. In this way, bottlenecks and active parts of metabolic networks can be distinguished. Additionally, 3D visualization techniques for metabolic networks are explored for cross-free drawing of the networks in 3D visualization space. Steerable drawing of metabolic networks is also investigated. In contrast to other approaches for drawing metabolic networks, user guided drawing of the networks allows the creation of high quality drawings by including user feedback in the drawing process. • Comparison of XML/SBML files. SBML (Systems Biology Markup Language) has become ubiquitous in metabolic modeling, serving the storage and exchange of models in XML format. Generally, the modeling process is an iterative task where the next generation model is a further development of the current model, resulting in a family of models stored in SBML format. The SBML format, however, includes a great deal of information, from the structure of the biochemical network to parameters of the model or measured data. Consequently, the CustX-Diff algorithm for a customizable comparison of XML files will be introduced. By customizing the comparison process through the specification of XPath expressions, an adaptable change detection process is enabled. Thus, the comparison process can be focused on specific parts of a XML/SBML document, e.g. on the structure of a metabolic network. • Visual exploration of time-varying sensitivity matrices. Sensitivity analysis is a special method used in simulation to analyze the sensitivity of a model with respect to its parameters. The results of sensitivity analysis of a metabolic network are large time-varying matrices, which need to be properly visualized. However, the visualization of time-varying high-dimensional data is a challenging problem. For this purpose, an extensible framework is proposed, consisting of existing and novel visualization methods, which allow the visual exploration of time-varying sensitivity matrices. Tabular visualization techniques, such as the reorderable matrix, are developed further, and algorithms for their reordering are discussed. Existing and novel techniques for exploring proximity data, both in matrix form and projected using multi-dimensional scaling (MDS), are also discussed. Information visualization paradigms such as focus+context based distortion and overview+details are proposed to enhance such techniques. • Cluster ensembles for analyzing time-varying sensitivity matrices. A novel relationship-based cluster ensemble, which relies on the accumulation of the evolving pairwise similarities of objects (i.e. parameters) will be proposed, as a robust and efficient method for clustering time-varying high-dimensional data. The time-dependent similarities, obtained from the fuzzy partitions created during the fuzzy clustering process, are aggregated, and the final clustering result is derived from this aggregation.

[1]  Jennifer Widom,et al.  Change detection in hierarchically structured information , 1996, SIGMOD '96.

[2]  Serge Abiteboul,et al.  Detecting changes in XML documents , 2002, Proceedings 18th International Conference on Data Engineering.

[3]  David A. Fell Systems properties of metabolic networks , 2000 .

[4]  I. J. Schoenberg Remarks to Maurice Frechet's Article ``Sur La Definition Axiomatique D'Une Classe D'Espace Distances Vectoriellement Applicable Sur L'Espace De Hilbert , 1935 .

[5]  A. Householder,et al.  Discussion of a set of points in terms of their mutual distances , 1938 .

[6]  Elio Masciari,et al.  Fast detection of XML structural similarity , 2005, IEEE Transactions on Knowledge and Data Engineering.

[7]  H V Westerhoff The silicon cell, not dead but live! , 2001, Metabolic engineering.

[8]  Tommi S. Jaakkola,et al.  Fast optimal leaf ordering for hierarchical clustering , 2001, ISMB.

[9]  Anil K. Jain,et al.  Algorithms for Clustering Data , 1988 .

[10]  Joseph A. C. Delaney Sensitivity analysis , 2018, The African Continental Free Trade Area: Economic and Distributional Effects.

[11]  U Rost,et al.  Visualisation of biochemical network simulations with SimWiz. , 2004, Systems biology.

[12]  Matthew O. Ward,et al.  Clutter Reduction in Multi-Dimensional Data Visualization Using Dimension Reordering , 2004, IEEE Symposium on Information Visualization.

[13]  Ana L. N. Fred,et al.  Finding Consistent Clusters in Data Partitions , 2001, Multiple Classifier Systems.

[14]  Michael Jünger,et al.  Algorithmen zum automatischen Zeichnen von Graphen , 1997, Informatik-Spektrum.

[15]  中尾 光輝,et al.  KEGG(Kyoto Encyclopedia of Genes and Genomes)〔和文〕 (特集 ゲノム医学の現在と未来--基礎と臨床) -- (データベース) , 2000 .

[16]  Bernd Freisleben,et al.  Visualizing time-varying matrices using multidimensional scaling and reorderable matrices , 2004, Proceedings. Eighth International Conference on Information Visualisation, 2004. IV 2004..

[17]  B. Palsson,et al.  Biochemical production capabilities of escherichia coli , 1993, Biotechnology and bioengineering.

[18]  Matthew O. Ward,et al.  Exploring N-dimensional databases , 1990, Proceedings of the First IEEE Conference on Visualization: Visualization `90.

[19]  Benjamin B. Bederson,et al.  Implementing a zooming User Interface: experience building Pad++ , 1998, Softw. Pract. Exp..

[20]  Philip Hans Franses,et al.  Visualizing time-varying correlations across stock markets , 2000 .

[21]  Xiaotie Deng,et al.  Crossings and Permutations , 2005, Graph Drawing.

[22]  Vladimir I. Levenshtein,et al.  Binary codes capable of correcting deletions, insertions, and reversals , 1965 .

[23]  A J Sinskey,et al.  Metabolic engineering--methodologies and future prospects. , 1993, Trends in biotechnology.

[24]  Catherine Plaisant,et al.  Navigation patterns and usability of zoomable user interfaces with and without an overview , 2002, TCHI.

[25]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[26]  A. Barabasi,et al.  Lethality and centrality in protein networks , 2001, Nature.

[27]  Joydeep Ghosh,et al.  Cluster Ensembles --- A Knowledge Reuse Framework for Combining Multiple Partitions , 2002, J. Mach. Learn. Res..

[28]  Masaru Tomita,et al.  E-CELL: software environment for whole-cell simulation , 1999, Bioinform..

[29]  Ben Shneiderman,et al.  Readings in information visualization - using vision to think , 1999 .

[30]  Alfred Inselberg,et al.  Parallel coordinates: a tool for visualizing multi-dimensional geometry , 1990, Proceedings of the First IEEE Conference on Visualization: Visualization `90.

[31]  M. E. McGill,et al.  Dynamic Graphics for Statistics. , 1990 .

[32]  Kaizhong Zhang,et al.  Simple Fast Algorithms for the Editing Distance Between Trees and Related Problems , 1989, SIAM J. Comput..

[33]  M. Friendly Corrgrams , 2002 .

[34]  Harri Siirtola,et al.  Combining parallel coordinates with the reorderable matrix , 2003, Proceedings International Conference on Coordinated and Multiple Views in Exploratory Visualization - CMV 2003 -.

[35]  B. Schneirdeman,et al.  Designing the User Interface: Strategies for Effective Human-Computer Interaction , 1998 .

[36]  Kaizhong Zhang,et al.  On the Editing Distance Between Unordered Labeled Trees , 1992, Inf. Process. Lett..

[37]  Ana L. N. Fred,et al.  Data clustering using evidence accumulation , 2002, Object recognition supported by user interaction for service robots.

[38]  C. D. Gelatt,et al.  Optimization by Simulated Annealing , 1983, Science.

[39]  Isabelle Bloch,et al.  Fuzzy distances and image processing , 1995, SAC '95.

[40]  Michael C Kohn,et al.  Use of Sensitivity Analysis to Assess Reliability of Metabolic and Physiological Models , 2002, Risk analysis : an official publication of the Society for Risk Analysis.

[41]  J. Bezdek A Physical Interpretation of Fuzzy ISODATA , 1993 .

[42]  K E Barner,et al.  Design of a haptic data visualization system for people with visual impairments. , 1999, IEEE transactions on rehabilitation engineering : a publication of the IEEE Engineering in Medicine and Biology Society.

[43]  A. Fielding Sensitivity Analysis in Linear Regression , 1990 .

[44]  Vipin Kumar,et al.  A Fast and High Quality Multilevel Scheme for Partitioning Irregular Graphs , 1998, SIAM J. Sci. Comput..

[45]  Mark D. Apperley,et al.  A review and taxonomy of distortion-oriented presentation techniques , 1994, TCHI.

[46]  Carla E. Brodley,et al.  Solving cluster ensemble problems by bipartite graph partitioning , 2004, ICML.

[47]  Emden R. Gansner,et al.  An open graph visualization system and its applications to software engineering , 2000 .

[48]  Carla E. Brodley,et al.  Random Projection for High Dimensional Data Clustering: A Cluster Ensemble Approach , 2003, ICML.

[49]  Mitsuhiko Toda,et al.  Methods for Visual Understanding of Hierarchical System Structures , 1981, IEEE Transactions on Systems, Man, and Cybernetics.

[50]  Patrick Lambrix,et al.  Representations of molecular pathways: an evaluation of SBML, PSI MI and BioPAX , 2005, Bioinform..

[51]  David Buttler,et al.  A Short Survey of Document Structure Similarity Algorithms , 2004, International Conference on Internet Computing.

[52]  Randall Frank,et al.  High-Resolution Multiprojector Display Walls , 2000, IEEE Computer Graphics and Applications.

[53]  Pedro Mendes,et al.  GEPASI: a software package for modelling the dynamics, steady states and control of biochemical and other systems , 1993, Comput. Appl. Biosci..

[54]  Steffen Klamt,et al.  FluxAnalyzer: exploring structure, pathways, and flux distributions in metabolic networks on interactive flux maps , 2003, Bioinform..

[55]  Ramana Rao,et al.  A focus+context technique based on hyperbolic geometry for visualizing large hierarchies , 1995, CHI '95.

[56]  Tamara Munzner,et al.  H3: laying out large directed graphs in 3D hyperbolic space , 1997, Proceedings of VIZ '97: Visualization Conference, Information Visualization Symposium and Parallel Rendering Symposium.

[57]  Anil K. Jain,et al.  Data clustering: a review , 1999, CSUR.

[58]  K. Wegner,et al.  SimWiz3D - visualising biochemical simulation results , 2005, Third International Conference on Medical Information Visualisation--BioMedical Visualisation.

[59]  Anil K. Jain,et al.  Combining multiple weak clusterings , 2003, Third IEEE International Conference on Data Mining.

[60]  Bernd Freisleben,et al.  From Enzyme Kinetics to Metabolic Network Modeling – Visualization Tool for Enhanced Kinetic Analysis of Biochemical Network Models , 2006 .

[61]  R. Sibson Studies in the Robustness of Multidimensional Scaling: Procrustes Statistics , 1978 .

[62]  David J. Marchette,et al.  Using data images for outlier detection , 2003, Comput. Stat. Data Anal..

[63]  Aristides Gionis,et al.  Clustering Aggregation , 2005, ICDE.

[64]  Cláudio T. Silva,et al.  Visualization Research with Large Displays , 2000, IEEE Computer Graphics and Applications.

[65]  B. Marx The Visual Display of Quantitative Information , 1985 .

[66]  Igor Rojdestvenski,et al.  Metabolic pathways in three dimensions , 2003, Bioinform..

[67]  R. Albert,et al.  The large-scale organization of metabolic networks , 2000, Nature.

[68]  Eiichi Tanaka,et al.  The Tree-to-Tree Editing Problem , 1988, Int. J. Pattern Recognit. Artif. Intell..

[69]  Bernd Freisleben,et al.  Distributed Simulation of Metabolic Networks with Model Variants , 2002, ESM.

[70]  Wolfgang Wiechert,et al.  Modeling and simulation: tools for metabolic engineering. , 2002, Journal of biotechnology.

[71]  Hiroaki Kitano,et al.  The systems biology markup language (SBML): a medium for representation and exchange of biochemical network models , 2003, Bioinform..

[72]  J. Bezdek,et al.  VAT: a tool for visual assessment of (cluster) tendency , 2002, Proceedings of the 2002 International Joint Conference on Neural Networks. IJCNN'02 (Cat. No.02CH37290).

[73]  R D Appel,et al.  A new generation of information retrieval tools for biologists: the example of the ExPASy WWW server. , 1994, Trends in biochemical sciences.

[74]  Igor Goryanin,et al.  Mathematical simulation and analysis of cellular metabolism and regulation , 1999, Bioinform..

[75]  Bernd Freisleben,et al.  The time-dependent reorderable matrix method for visualizing evolving tabular data , 2005, IS&T/SPIE Electronic Imaging.

[76]  Ulrik Brandes,et al.  Visualizing Related Metabolic Pathways in Two and a Half Dimensions , 2003, GD.

[77]  C. J.,et al.  Predicting Temporal Fluctuations in an Intracellular Signalling Pathway , 1998 .

[78]  H. Kitano Systems Biology: A Brief Overview , 2002, Science.

[79]  Mark Hereld,et al.  Introduction to building projection-based tiled display systems , 2000, IEEE Computer Graphics and Applications.

[80]  Moni Naor,et al.  Rank aggregation methods for the Web , 2001, WWW '01.

[81]  Andrei Z. Broder,et al.  On the resemblance and containment of documents , 1997, Proceedings. Compression and Complexity of SEQUENCES 1997 (Cat. No.97TB100171).

[82]  Herbert M Sauro,et al.  Sensitivity analysis of stoichiometric networks: an extension of metabolic control analysis to non-steady state trajectories. , 2003, Journal of theoretical biology.

[83]  Mohamed S. Kamel,et al.  Clustering ensemble using swarm intelligence , 2003, Proceedings of the 2003 IEEE Swarm Intelligence Symposium. SIS'03 (Cat. No.03EX706).

[84]  Hector Garcia-Molina,et al.  Meaningful change detection in structured data , 1997, SIGMOD '97.

[85]  James C. Schaff,et al.  The Virtual Cell , 2002, Annals of the New York Academy of Sciences.

[86]  Benjamin B. Bederson,et al.  Jazz: an extensible zoomable user interface graphics toolkit in Java , 2000, UIST '00.

[87]  Jonathan C. Roberts,et al.  On encouraging multiple views for visualization , 1998, Proceedings. 1998 IEEE Conference on Information Visualization. An International Conference on Computer Visualization and Graphics (Cat. No.98TB100246).

[88]  A. Cornish-Bowden Fundamentals of Enzyme Kinetics , 1979 .

[89]  Philip S. Yu,et al.  A Framework for Clustering Evolving Data Streams , 2003, VLDB.

[90]  David J. DeWitt,et al.  X-Diff: an effective change detection algorithm for XML documents , 2003, Proceedings 19th International Conference on Data Engineering (Cat. No.03CH37405).

[91]  Falk Schreiber,et al.  High quality visualization of biochemical pathways in BioPath , 2002, Silico Biol..

[92]  R. Daniel Bergeron,et al.  Stereophonic and surface sound generation for exploratory data analysis , 1990, CHI '90.

[93]  Charles T. Zahn,et al.  Graph-Theoretical Methods for Detecting and Describing Gestalt Clusters , 1971, IEEE Transactions on Computers.

[94]  Stuart K. Card,et al.  Information visualization tutorial , 1997, CHI Extended Abstracts.

[95]  Ana L. N. Fred,et al.  Combining multiple clusterings using evidence accumulation , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[96]  Heidrun Schumann,et al.  Visualization for modeling and simulation: visualization methods for time-dependent data - an overview , 2003, WSC '03.

[97]  Colin Ware,et al.  Information Visualization: Perception for Design , 2000 .

[98]  Xiaohua Hu,et al.  Cluster Ensemble and Its Applications in Gene Expression Analysis , 2004, APBC.

[99]  Bernd Freisleben,et al.  Customizable detection of changes for XML documents using XPath expressions , 2006, DocEng '06.

[100]  Dan Suciu,et al.  Containment and equivalence for a fragment of XPath , 2004, JACM.

[101]  Erkki Mäkinen,et al.  Reordering the Reorderable Matrix as an Algorithmic Problem , 2000, Diagrams.

[102]  Herman Chernoff,et al.  The Use of Faces to Represent Points in k- Dimensional Space Graphically , 1973 .

[103]  Hans G. Schlegel,et al.  Biology of the prokaryotes , 1999 .

[104]  Tim Dwyer,et al.  Representing Experimental Biological Data in Metabolic Networks , 2004, APBC.

[105]  Panos M. Pardalos,et al.  The Quadratic Assignment Problem: A Survey and Recent Developments , 1993, Quadratic Assignment and Related Problems.

[106]  Philippe Castagliola,et al.  A Comparison of the Readability of Graphs Using Node-Link and Matrix-Based Representations , 2004 .

[107]  I K Fodor,et al.  A Survey of Dimension Reduction Techniques , 2002 .

[108]  Bernd Freisleben,et al.  Investigating the dynamic behavior of biochemical networks using model families , 2005, Bioinform..

[109]  M. Trick,et al.  Voting schemes for which it can be difficult to tell who won the election , 1989 .

[110]  Bernd Freisleben,et al.  3 D VISUALIZATION AND ANIMATION OF METABOLIC NETWORKS , 2004 .

[111]  Joydeep Ghosh,et al.  A Consensus Framework for Integrating Distributed Clusterings Under Limited Knowledge Sharing , 2002 .

[112]  Lawrence Hunter,et al.  Molecular biology for computer scientists , 1993 .

[113]  David S. Johnson,et al.  Computers and Intractability: A Guide to the Theory of NP-Completeness , 1978 .

[114]  D. Fell,et al.  The small world inside large metabolic networks , 2000, Proceedings of the Royal Society of London. Series B: Biological Sciences.

[115]  Ralf Takors,et al.  Monitoring and Modeling of the Reaction Dynamics in the Valine/Leucine Synthesis Pathway in Corynebacterium glutamicum , 2006, Biotechnology progress.

[116]  Bernd Freisleben,et al.  Visual exploration of time-varying matrices , 2005, Ninth International Conference on Information Visualisation (IV'05).

[117]  Philip S. Yu,et al.  On High Dimensional Projected Clustering of Data Streams , 2005, Data Mining and Knowledge Discovery.

[118]  Ursula Kummer,et al.  A new dynamical layout algorithm for complex biochemical reaction networks , 2005, BMC Bioinformatics.

[119]  Robert Tibshirani,et al.  Estimating the number of clusters in a data set via the gap statistic , 2000 .

[120]  Sudipto Guha,et al.  Clustering Data Streams: Theory and Practice , 2003, IEEE Trans. Knowl. Data Eng..

[121]  Ana L. N. Fred,et al.  Evidence Accumulation Clustering Based on the K-Means Algorithm , 2002, SSPR/SPR.

[122]  Tayuan Huang,et al.  Metrics on Permutations, a Survey , 2004 .

[123]  G. W. Furnas,et al.  Generalized fisheye views , 1986, CHI '86.

[124]  Chris North,et al.  Visualization of Graphs with Associated Timeseries Data , 2005, INFOVIS.

[125]  H. Spath The Cluster Dissection and Analysis Theory FORTRAN Programs Examples , 1985 .

[126]  Ben Shneiderman,et al.  The eyes have it: a task by data type taxonomy for information visualizations , 1996, Proceedings 1996 IEEE Symposium on Visual Languages.

[127]  Kaizhong Zhang A New Editing based Distance between Unordered Labeled Trees , 1993, CPM.

[128]  John W. Sammon,et al.  A Nonlinear Mapping for Data Structure Analysis , 1969, IEEE Transactions on Computers.

[129]  Sachindra Joshi,et al.  A bag of paths model for measuring structural similarity in Web documents , 2003, KDD '03.

[130]  Philip S. Yu,et al.  A Framework for Projected Clustering of High Dimensional Data Streams , 2004, VLDB.

[131]  H. V. Jagadish,et al.  Evaluating Structural Similarity in XML Documents , 2002, WebDB.

[132]  Isabel Rojas,et al.  A graph layout algorithm for drawing metabolic pathways , 2001, Bioinform..

[133]  Igor Rojdestvenski,et al.  Visualizing metabolic networks in VRML , 2002, Proceedings Sixth International Conference on Information Visualisation.

[134]  Saul Greenberg,et al.  Navigating hierarchically clustered networks through fisheye and full-zoom methods , 1996, TCHI.

[135]  Michael I. Jordan,et al.  On Spectral Clustering: Analysis and an algorithm , 2001, NIPS.

[136]  Peter D. Karp,et al.  Automated Drawing of Metabolic Pathways , 2000 .

[137]  Herbert M. Sauro,et al.  33 JARNAC: a system for interactive metabolic analysis , 2000 .

[138]  Helwig Hauser,et al.  Time histograms for large, time-dependent data , 2004, VISSYM'04.

[139]  Kuo-Chung Tai,et al.  The Tree-to-Tree Correction Problem , 1979, JACM.

[140]  Bernd Freisleben,et al.  Visualizing regulatory interdependencies and parameter sensitivities in biochemical network models , 2008, Math. Comput. Simul..

[141]  Elio Masciari,et al.  Detecting Structural Similarities between XML Documents , 2002, WebDB.

[142]  Guang R. Gao,et al.  An adaptive meta-clustering approach: combining the information from different clustering results , 2002, Proceedings. IEEE Computer Society Bioinformatics Conference.

[143]  J. A. Hartigan,et al.  A k-means clustering algorithm , 1979 .

[144]  Helen C. Purchase,et al.  Metrics for Graph Drawing Aesthetics , 2002, J. Vis. Lang. Comput..

[145]  P. Rousseeuw Silhouettes: a graphical aid to the interpretation and validation of cluster analysis , 1987 .

[146]  R. Graham,et al.  Spearman's Footrule as a Measure of Disarray , 1977 .

[147]  Matthias Reuss,et al.  Dynamic sensitivity analysis for metabolic systems , 1997 .