A Cluster Merging Method for Time Series microarray with production Values

A challenging task in time-course microarray data analysis is to cluster genes meaningfully combining the information provided by multiple replicates covering the same key time points. This paper proposes a novel cluster merging method to accomplish this goal obtaining groups with highly correlated genes. The main idea behind the proposed method is to generate a clustering starting from groups created based on individual temporal series (representing different biological replicates measured in the same time points) and merging them by taking into account the frequency by which two genes are assembled together in each clustering. The gene groups at the level of individual time series are generated using several shape-based clustering methods. This study is focused on a real-world time series microarray task with the aim to find co-expressed genes related to the production and growth of a certain bacteria. The shape-based clustering methods used at the level of individual time series rely on identifying similar gene expression patterns over time which, in some models, are further matched to the pattern of production/growth. The proposed cluster merging method is able to produce meaningful gene groups which can be naturally ranked by the level of agreement on the clustering among individual time series. The list of clusters and genes is further sorted based on the information correlation coefficient and new problem-specific relevant measures. Computational experiments and results of the cluster merging method are analyzed from a biological perspective and further compared with the clustering generated based on the mean value of time series and the same shape-based algorithm.

[1]  D. Botstein,et al.  Cluster analysis and display of genome-wide expression patterns. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[2]  Yoshio Katakura,et al.  Extracting the hidden features in saline osmotic tolerance in Saccharomyces cerevisiae from DNA microarray data using the self-organizing map: biosynthesis of amino acids , 2007, Applied Microbiology and Biotechnology.

[3]  Christine J. Martin,et al.  Biosynthesis of the immunosuppressants FK506, FK520, and rapamycin involves a previously undescribed family of enzymes acting on chorismate , 2011, Proceedings of the National Academy of Sciences.

[4]  Shyamal D. Peddada,et al.  Gene Selection and Clustering for Time-course and Dose-response Microarray Experiments Using Order-restricted Inference , 2003, Bioinform..

[5]  David F. Barrero,et al.  A Genetic Graph-Based Approach for Partitional Clustering , 2014, Int. J. Neural Syst..

[6]  Bartosz Krawczyk,et al.  Improved Adaptive Splitting and Selection: the Hybrid Training Method of a Classifier Based on a Feature Space Partitioning , 2014, Int. J. Neural Syst..

[7]  Kay Nieselt,et al.  The dynamic architecture of the metabolic switch in Streptomyces coelicolor , 2010, BMC Genomics.

[8]  Yuh-Min Chen,et al.  Gene selection and sample classification on microarray data based on adaptive genetic algorithm/k-nearest neighbor method , 2011, Expert Syst. Appl..

[9]  Concha Bielza,et al.  Machine Learning in Bioinformatics , 2008, Encyclopedia of Database Systems.

[10]  Sieu Phan,et al.  A novel pattern based clustering methodology for time-series microarray data , 2007, Int. J. Comput. Math..

[11]  Feng Chu,et al.  Applications of support vector machines to cancer classification with microarray data , 2005, Int. J. Neural Syst..

[12]  Ziv Bar-Joseph,et al.  STEM: a tool for the analysis of short time series gene expression data , 2006, BMC Bioinformatics.

[13]  Niels Wessel,et al.  Assessment of Feature Selection and Classification Approaches to Enhance Information from overnight oximetry in the Context of Apnea Diagnosis , 2013, Int. J. Neural Syst..

[14]  Frank Klawonn,et al.  Clustering of unevenly sampled gene expression time-series data , 2005, Fuzzy Sets Syst..

[15]  Pedro Larrañaga,et al.  A review of feature selection techniques in bioinformatics , 2007, Bioinform..

[16]  Lei Liu,et al.  Ensemble gene selection by grouping for microarray data classification , 2010, J. Biomed. Informatics.

[17]  S. Junne,et al.  Transcriptional analysis of product‐concentration driven changes in cellular programs of recombinant Clostridium acetobutylicumstrains , 2003, Biotechnology and bioengineering.

[18]  Emilio Corchado,et al.  Merge Method for Shape-Based Clustering in Time Series Microarray Analysis , 2012, IDEAL.

[19]  Chang-Tsun Li,et al.  A temporal precedence based clustering method for gene expression microarray data , 2010, BMC Bioinformatics.

[20]  Taesung Park,et al.  Rank-Based Clustering Analysis for the Time-Course microarray Data , 2009, J. Bioinform. Comput. Biol..

[21]  Frank-Michael Schleif,et al.  Linear Time Relational Prototype Based Learning , 2012, Int. J. Neural Syst..

[22]  K. O. Elliston,et al.  Structural organization of a multifunctional polyketide synthase involved in the biosynthesis of the macrolide immunosuppressant FK506. , 1997, European journal of biochemistry.

[23]  Aimé Lay-Ekuakille,et al.  Analysis of Absence Seizure Generation using EEG Spatial-Temporal Regularity Measures , 2012, Int. J. Neural Syst..

[24]  Philipp Cimiano,et al.  Online Semi-Supervised Growing Neural Gas , 2012, Int. J. Neural Syst..

[25]  J. R. Sokatch BIOSYNTHESIS OF AMINO ACIDS , 1969 .

[26]  H. Motamedi,et al.  The biosynthetic gene cluster for the macrolactone ring of the immunosuppressant FK506. , 1998, European journal of biochemistry.

[27]  Terence P. Speed,et al.  A comparison of normalization methods for high density oligonucleotide array data based on variance and bias , 2003, Bioinform..

[28]  L. Chung,et al.  The FK520 gene cluster of Streptomyces hygroscopicus var. ascomyceticus (ATCC 14891) contains genes for biosynthesis of unusual polyketide extender units. , 2000, Gene.

[29]  M. Chang,et al.  Identification and reconstitution of genetic regulatory networks for improved microbial tolerance to isooctane. , 2012, Molecular bioSystems.

[30]  Terry Speed,et al.  Normalization of cDNA microarray data. , 2003, Methods.

[31]  Yit-Heng Chooi,et al.  Metabolic engineering for the production of natural products. , 2011, Annual review of chemical and biomolecular engineering.

[32]  Carlos Prieto,et al.  Human Gene Coexpression Landscape: Confident Network Derived from Tissue Transcriptomic Profiles , 2008, PloS one.

[33]  Nan Lin,et al.  Information criterion-based clustering with order-restricted candidate profiles in short time-course microarray experiments , 2009, BMC Bioinformatics.

[34]  Hojjat Adeli,et al.  Enhanced probabilistic neural network with local decision circles: A robust classifier , 2010, Integr. Comput. Aided Eng..

[35]  C. S. Möller-Leveta,et al.  Clustering of unevenly sampled gene expression time-series data , 2005 .

[36]  Atul J. Butte,et al.  Systematic survey reveals general applicability of "guilt-by-association" within gene coexpression networks , 2005, BMC Bioinformatics.