The Utility of Sequence, Function and Transcriptional Information in Gene Expression Profiling *

As the determination of the DNA sequences comprising the genome of various organisms came to completion or, nears completion, a shift from static structural genomics to dynamic functional genomics is taken place. In this paper we focus on functional genomics and in particular on the analysis of microarray data. Microarray or, gene-expression data analysis is heavily depended on Gene Expression Data Mining (GEDM) technology, and in the very-last years a lot of research efforts are in progress. GEDM is used to identify intrinsic patterns and relationships in gene expression data. The identification of patterns in complex gene expression datasets provides two benefits: (i) generation of insight into gene regulation, and (ii) characterization of multiple gene expression profiles in complex biological processes, e.g. pathological states [8]. GEDM activities are based on two approaches: (a) hypothesis testingto investigate the induction or perturbation of a biological process that leads to predicted results, and (b) knowledge discoveryto detect internal structure in biological data [1, 13]. In this paper we present an integrated methodology that combines both. It is based on a hybrid clustering approach able to compute and utilize different distances (or, similarities) between the objects to be clustered. In this respect the whole exploratory data analysis process becomes more knowledgeable in the sense that pre-established domain-knowledge is used to guide the clustering.

[1]  Jason Weston,et al.  Learning Gene Functional Classifications from Multiple Data Types , 2002, J. Comput. Biol..

[2]  Douglas H. Fisher,et al.  Knowledge Acquisition Via Incremental Conceptual Clustering , 1987, Machine Learning.

[3]  George Potamias Distance and Feature-Based Clustering of Time Series: An Application on Neurophysiology , 2002, SETN.

[4]  G. Zweiger,et al.  Knowledge discovery in gene-expression-microarray data: mining the information output of the genome. , 1999, Trends in biotechnology.

[5]  Y Xu,et al.  Minimum spanning trees for gene expression data clustering. , 2001, Genome informatics. International Conference on Genome Informatics.

[6]  M. Eisen,et al.  Gene expression informatics —it's all in your mine , 1999, Nature Genetics.

[7]  J. Barker,et al.  Large-scale temporal gene expression mapping of central nervous system development. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[8]  Charles T. Zahn,et al.  Graph-Theoretical Methods for Detecting and Describing Gestalt Clusters , 1971, IEEE Transactions on Computers.

[9]  Inmaculada Fortes,et al.  Dynamic Discretization of Continuous Values from Time Series , 2000, ECML.

[10]  M Schena,et al.  Microarrays: biotechnology's discovery platform for functional genomics. , 1998, Trends in biotechnology.

[11]  Jill P. Mesirov,et al.  Computational Biology , 2018, Encyclopedia of Parallel Computing.

[12]  Tony R. Martinez,et al.  Improved Heterogeneous Distance Functions , 1996, J. Artif. Intell. Res..

[13]  Jason Weston,et al.  Gene functional classification from heterogeneous data , 2001, RECOMB.

[14]  Philip Laird,et al.  Identifying and Using Patterns in Sequential Data , 1993, ALT.