Data Integration and Knowledge Discovery in Life Sciences

Recent advances in various forms of omics technologies have generated huge amount of data. To fully exploit these data sets that in many cases are publicly available, robust computational methodologies need to be developed to deal with the storage, integration, analysis, visualization, and dissemination of these data. In this paper, we describe some of our research activities in data integration leading to novel knowledge discovery in life sciences. Our multistrategy approach with integration of prior knowledge facilitates a novel means to identify informative genes that could have been missed by the commonly used methods. Our transcriptomics-proteomics integrative framework serves as a means to enhance the confidence of and also to complement transcriptomics discovery. Our new research direction in integrative data analysis of omics data is targeted to identify molecular associations to disease and therapeutic response signatures. The ultimate goal of this research is to facilitate the development of clinical test-kits for early detection, accurate diagnosis/prognosis of disease, and better personalized therapeutic management.

[1]  Hanlee P. Ji,et al.  Next-generation DNA sequencing , 2008, Nature Biotechnology.

[2]  Paul T. Groth,et al.  The ENCODE (ENCyclopedia Of DNA Elements) Project , 2004, Science.

[3]  Gang Wu,et al.  Integrative Analysis of Transcriptomic and Proteomic Data: Challenges, Solutions and Applications , 2007, Critical reviews in biotechnology.

[4]  The Importance of Biological Databases in Biological Discovery , 2006, Current protocols in bioinformatics.

[5]  Jerzy Ostrowski,et al.  Integrating genomics, proteomics and bioinformatics in translational studies of molecular medicine , 2009, Expert review of molecular diagnostics.

[6]  Christiane Cantin,et al.  Investigation of three new mouse mammary tumor cell lines as models for transforming growth factor (TGF)-β and Neu pathway signaling studies: identification of a novel model for TGF-β-induced epithelial-to-mesenchymal transition , 2004, Breast Cancer Research.

[7]  Pablo Tamayo,et al.  Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[8]  R. Fleischmann,et al.  Whole-genome random sequencing and assembly of Haemophilus influenzae Rd. , 1995, Science.

[9]  Susumu Goto,et al.  KEGG: Kyoto Encyclopedia of Genes and Genomes , 2000, Nucleic Acids Res..

[10]  M. Tainsky,et al.  Genomic and proteomic biomarkers for cancer: a multitude of opportunities. , 2009, Biochimica et biophysica acta.

[11]  J. Ross,et al.  Multigene Classifiers, Prognostic Factors, and Predictors of Breast Cancer Clinical Outcome , 2009, Advances in anatomic pathology.

[12]  J. Ecker,et al.  Applications of DNA tiling arrays for whole-genome analysis. , 2005, Genomics.

[13]  M. Kesarwani,et al.  Genetic Interactions of TGA Transcription Factors in the Regulation of Pathogenesis-Related Genes and Disease Resistance in Arabidopsis1[W] , 2007, Plant Physiology.

[14]  Joe W. Gray,et al.  Translating insights from the cancer genome into clinical practice , 2008, Nature.

[15]  Purvesh Khatri,et al.  Ontological analysis of gene expression data: current tools, limitations, and open problems , 2005, Bioinform..

[16]  Peter Bühlmann,et al.  Analyzing gene expression data in terms of gene sets: methodological issues , 2007, Bioinform..

[17]  J. Uhm Comprehensive genomic characterization defines human glioblastoma genes and core pathways , 2009 .

[18]  Qi Liu,et al.  Gene-set analysis and reduction , 2008, Briefings Bioinform..

[19]  Michael Y. Galperin,et al.  Nucleic Acids Research annual Database Issue and the NAR online Molecular Biology Database Collection in 2009 , 2008, Nucleic Acids Res..

[20]  M. Brent Steady progress and recent breakthroughs in the accuracy of automated genome annotation , 2008, Nature Reviews Genetics.

[21]  Joseph R. Ecker,et al.  Corrigendum to ‘‘Applications of DNA tiling arrays for whole-genome analysis’’ [Genomics 85 (2005) 1–15] , 2005 .

[22]  中尾 光輝,et al.  KEGG(Kyoto Encyclopedia of Genes and Genomes)〔和文〕 (特集 ゲノム医学の現在と未来--基礎と臨床) -- (データベース) , 2000 .

[23]  Christiane Cantin,et al.  Glycoproteomic analysis of two mouse mammary cell lines during transforming growth factor (TGF)-β induced epithelial to mesenchymal transition , 2009, Proteome Science.

[24]  B. Palsson,et al.  The model organism as a system: integrating 'omics' data sets , 2006, Nature Reviews Molecular Cell Biology.

[25]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[26]  D. Allison,et al.  Microarray data analysis: from disarray to consolidation and consensus , 2006, Nature Reviews Genetics.

[27]  C. Després,et al.  Redox control of systemic acquired resistance. , 2005, Current opinion in plant biology.

[28]  G. Lubec,et al.  Limitations and pitfalls in protein identification by mass spectrometry. , 2007, Chemical reviews.

[29]  A. Makarov,et al.  The Orbitrap: a new mass spectrometer. , 2005, Journal of mass spectrometry : JMS.

[30]  Sieu Phan,et al.  A Multi-Strategy Approach to Informative Gene Identification from Gene Expression Data , 2010, J. Bioinform. Comput. Biol..