Combining feature selection and DTW for time-varying functional genomics

Given temporal high-throughput data defining a two-class functional genomic process, feature selection algorithms may be applied to extract a panel of discriminating gene time series. We aim to identify the main trends of activity through time. A reconstruction method based on stagewise boosting is endowed with a similarity measure based on the dynamic time warping (DTW) algorithm, defining a ranked set of time-series component contributing most to the reconstruction. The approach is applied on synthetic and public microarray data. On the Cardiogenomics PGA Mouse Model of Myocardial Infarction, the approach allows the identification of a time-varying molecular profile of the ventricular remodeling process.

[1]  J. Friedman Greedy function approximation: A gradient boosting machine. , 2001 .

[2]  Yoav Freund,et al.  Experiments with a New Boosting Algorithm , 1996, ICML.

[3]  Cesare Furlanello,et al.  Entropy-based gene ranking without selection bias for the predictive classification of microarray data , 2003, BMC Bioinformatics.

[4]  George M. Church,et al.  Aligning gene expression time series with time warping algorithms , 2001, Bioinform..

[5]  S. Chiba,et al.  Dynamic programming algorithm optimization for spoken word recognition , 1978 .

[6]  Ziv Bar-Joseph,et al.  Clustering short time series gene expression data , 2005, ISMB.

[7]  M Richard Simon,et al.  Design and Analysis of DNA Microarray Investigations , 2004 .

[8]  Eamonn J. Keogh,et al.  Scaling up dynamic time warping for datamining applications , 2000, KDD '00.

[9]  Eamonn J. Keogh,et al.  Iterative Deepening Dynamic Time Warping for Time Series , 2002, SDM.

[10]  J. Derisi,et al.  The Transcriptome of the Intraerythrocytic Developmental Cycle of Plasmodium falciparum , 2003, PLoS biology.

[11]  Edward R. Dougherty,et al.  Superior feature-set ranking for small samples using bolstered error estimation , 2005, Bioinform..

[12]  Fan Wu Computational methods for analysis and modeling of time-course gene expression data , 2004 .

[13]  S. Merler,et al.  Semisupervised learning for molecular profiling , 2005, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[14]  Nello Cristianini,et al.  Kernel Methods for Pattern Analysis , 2003, ICTAI.

[15]  김동규,et al.  [서평]「Algorithms on Strings, Trees, and Sequences」 , 2000 .

[16]  Eyke Hüllermeier,et al.  Clustering of gene expression data using a local shape-based similarity measure , 2005, Bioinform..

[17]  Dan Gusfield,et al.  Algorithms on Strings, Trees, and Sequences - Computer Science and Computational Biology , 1997 .

[18]  Paola Sebastiani,et al.  Cluster analysis of gene expression dynamics , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[19]  Eamonn J. Keogh,et al.  An Enhanced Representation of Time Series Which Allows Fast and Accurate Classification, Clustering and Relevance Feedback , 1998, KDD.

[20]  Geoffrey J McLachlan,et al.  Selection bias in gene extraction on the basis of microarray gene-expression data , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[21]  J. Schug,et al.  PlasmoDB: An integrative database of the Plasmodium falciparum genome. Tools for accessing and analyzing finished and unfinished sequence data. The Plasmodium Genome Database Collaborative. , 2001, Nucleic acids research.

[22]  Eric R. Ziegel,et al.  The Elements of Statistical Learning , 2003, Technometrics.

[23]  M. Radmacher,et al.  Pitfalls in the use of DNA microarray data for diagnostic and prognostic classification. , 2003, Journal of the National Cancer Institute.

[24]  Wolfgang Huber,et al.  A Compendium to Ensure Computational Reproducibility in High-Dimensional Classification Tasks , 2004, Statistical applications in genetics and molecular biology.

[25]  Hujun Yin,et al.  Modeling and analysis of gene expression time-series based on co-expression , 2005, Int. J. Neural Syst..

[26]  Fang-Xiang Wu,et al.  Dynamic Model-based Clustering for Time-course Gene Expression Data , 2005, J. Bioinform. Comput. Biol..

[27]  Stefan Michiels,et al.  Prediction of cancer outcome with microarrays: a multiple random validation strategy , 2005, The Lancet.