ANALYSIS AND INTEGRATION OF BIOLOGICAL DATA: A DATA MINING APPROACH USING NEURAL NETWORKS

The volume of information derived from postgenomic technologies is rapidly increasing. Due to the amount of data involved, novel computational methods are needed for the analysis and knowledge discovery into the massive data sets produced by these new technologies. Furthermore, data integration is also gaining attention for merging signals from dierent sources in order to discover unknown relations. This chapter presents a pipeline for biological data integration and discovery of a-priori unknown relationships between gene expression and metabolite variations. In this pipeline, two standard clustering methods are compared against a novel neural network approach. The neural model provides a simple visualization interface for identification of coordinated patterns variations, independently of the number of produced clusters. Several quality measurements have been defined for the evaluation of the clustering results obtained on a case study involving transcriptomic and metabolomic profiles from tomato fruits. Moreover, a method is proposed for the evaluation of the biological significance of the clusters found. The neural model has shown a high performance in most of the quality measures, with internal coherence in all the identified clusters and better visualization capabilities.

[1]  M. Hirai,et al.  Elucidation of Gene-to-Gene and Metabolite-to-Gene Networks in Arabidopsis by Integration of Metabolomics and Transcriptomics* , 2005, Journal of Biological Chemistry.

[2]  John Quackenbush,et al.  Computational genetics: Computational analysis of microarray data , 2001, Nature Reviews Genetics.

[3]  John C. Lindon,et al.  The handbook of metabonomics and metabolomics , 2007 .

[4]  Hideyuki Suzuki,et al.  KaPPA-View. A Web-Based Analysis Tool for Integration of Transcript and Metabolite Data on Plant Metabolic Pathway Maps1[w] , 2005, Plant Physiology.

[5]  M. Hirai,et al.  Decoding genes with coexpression networks and metabolomics - 'majority report by precogs'. , 2008, Trends in plant science.

[6]  Alfred Ultsch,et al.  Data Mining and Knowledge Discovery with Emergent Self-Organizing Feature Maps for Multivariate Time Series , 1999 .

[7]  U. Brandes,et al.  Social network analysis and visualization [Applications Corner] , 2008, IEEE Signal Processing Magazine.

[8]  Sam Lightstone,et al.  Data Mining - Know It All , 2008 .

[9]  Francisco Azuaje,et al.  Clustering Genomic Expression Data: Design and Evaluation Principles , 2003 .

[10]  Michalis Vazirgiannis,et al.  On Clustering Validation Techniques , 2001, Journal of Intelligent Information Systems.

[11]  Atul J. Butte,et al.  Systematic survey reveals general applicability of "guilt-by-association" within gene coexpression networks , 2005, BMC Bioinformatics.

[12]  Teuvo Kohonen,et al.  Self-Organizing Maps , 2010 .

[13]  A. Holmgren,et al.  Thioredoxin and glutaredoxin systems. , 2019, The Journal of biological chemistry.

[14]  Kazuki Saito,et al.  Integrated Data Mining of Transcriptome and Metabolome Based on BL-SOM , 2006 .

[15]  L. Sweetlove,et al.  Comparison of changes in fruit gene expression in tomato introgression lines provides evidence of genome-wide transcriptional changes and reveals links to mapped QTLs and described traits. , 2005, Journal of experimental botany.

[16]  Daniel T. Larose,et al.  Discovering Knowledge in Data: An Introduction to Data Mining , 2005 .

[17]  V. Lacroix,et al.  An Introduction to Metabolic Networks and Their Structural Analysis , 2008, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[18]  Simon Haykin,et al.  Neural Networks: A Comprehensive Foundation (3rd Edition) , 2007 .

[19]  T. Kohonen Self-organized formation of topographically correct feature maps , 1982 .

[20]  Tao Xiong,et al.  A combined SVM and LDA approach for classification , 2005, Proceedings. 2005 IEEE International Joint Conference on Neural Networks, 2005..

[21]  J. Mesirov,et al.  Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. , 1999, Science.

[22]  Fabrice Guillet,et al.  Quality Measures in Data Mining , 2009, Studies in Computational Intelligence.

[23]  Edward Keedwell,et al.  Intelligent Bioinformatics: The Application of Artificial Intelligence Techniques to Bioinformatics Problems , 2005 .

[24]  M. Zanor,et al.  Integrated Analysis of Metabolite and Transcript Levels Reveals the Metabolic Shifts That Underlie Tomato Fruit Development and Highlight Regulatory Aspects of Metabolic Network Behavior1[W] , 2006, Plant Physiology.

[25]  Donald W. Bouldin,et al.  A Cluster Separation Measure , 1979, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[26]  Douglas B. Kell,et al.  Computational cluster validation in post-genomic data analysis , 2005, Bioinform..

[27]  John Quackenbush,et al.  Microarray gene expression data analysis - a beginner's guide , 2003 .

[28]  Kazuki Saito,et al.  Potential of metabolomics as a functional genomics tool. , 2004, Trends in plant science.

[29]  E. Forgy,et al.  Cluster analysis of multivariate data : efficiency versus interpretability of classifications , 1965 .

[30]  Richard O. Duda,et al.  Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[31]  Alexandros Kanterakis,et al.  Feature Selection for the Promoter Recognition and Prediction Problem , 2007, Int. J. Data Warehous. Min..

[32]  M. Daly,et al.  Guilt by association , 2000, Nature Genetics.

[33]  Peter Meinicke,et al.  MarVis: a tool for clustering and visualization of metabolic biomarkers , 2009, BMC Bioinformatics.

[34]  Yuehui Chen,et al.  Computational Intelligence in Bioinformatics , 2008, Computational Intelligence in Bioinformatics.

[35]  美弦 矢野,et al.  <ファクトデータベース・フリーウェア特集号> 一括学習型自己組織化マップ(BL-SOM)を利用したメタボロームおよびトランスクリプトームデータの統合解析 , 2006 .

[36]  M. Hirai,et al.  Integration of transcriptomics and metabolomics for understanding of global responses to nutritional stresses in Arabidopsis thaliana. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[37]  Z. Lippman,et al.  An integrated view of quantitative trait variation using tomato interspecific introgression lines. , 2007, Current opinion in genetics & development.

[38]  Georgina Stegmayer,et al.  Neural network model for integration and visualization of introgressed genome and metabolite data , 2009, 2009 International Joint Conference on Neural Networks.

[39]  Teuvo Kohonen,et al.  Self-organized formation of topologically correct feature maps , 2004, Biological Cybernetics.

[40]  David Taniar,et al.  Exploring Advances in Interdisciplinary Data Mining and Analytics: New Trends , 2011 .

[41]  Ian Witten,et al.  Data Mining , 2000 .