HICCUP: Hierarchical Clustering Based Value Imputation using Heterogeneous Gene Expression Microarray Datasets

A novel microarray value imputation method, HICCUP1, is presented. HICCUP improves upon existing value imputation methods in the several ways. (1) By judiciously integrating heterogeneous microarray datasets using hierarchical clustering, HICCUP overcomes the limitation of using only single dataset with limited number of samples; (2) Unlike local or global value imputation methods, by mining association rules, HICCUP selects appropriate subsets of the most relevant samples for better value imputation; and (3) by exploiting relationship among the sample space (e.g., cancer vs. non-cancer samples), HICCUP improves the accuracy of value imputation. Experiments with a real prostate cancer microarray dataset verify that HICCUP outperforms existing approaches.

[1]  Jian Pei,et al.  Mining phenotypes and informative genes from gene expression data , 2003, KDD '03.

[2]  Sanghyun Park,et al.  Building a Classifier for Integrated Microarray Datasets through Two-Stage Approach , 2006, Sixth IEEE Symposium on BioInformatics and BioEngineering (BIBE'06).

[3]  Jian Pei,et al.  Mining frequent patterns without candidate generation , 2000, SIGMOD '00.

[4]  Arlindo L. Oliveira,et al.  Biclustering algorithms for biological data analysis: a survey , 2004, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[5]  Patrik D'haeseleer,et al.  Genetic network inference: from co-expression clustering to reverse engineering , 2000, Bioinform..

[6]  Carla E. Brodley,et al.  Visualization and interactive feature selection for unsupervised data , 2000, KDD '00.

[7]  Atul J. Butte,et al.  Quantifying the relationship between co-expression, co-regulation and gene function , 2004, BMC Bioinformatics.

[8]  Huiqing Liu,et al.  A comparative study on feature selection and classification methods using gene expression profiles and proteomic patterns. , 2002, Genome informatics. International Conference on Genome Informatics.

[9]  Russ B. Altman,et al.  Missing value estimation methods for DNA microarrays , 2001, Bioinform..

[10]  Jitendra Malik,et al.  Normalized cuts and image segmentation , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[11]  Huan Liu,et al.  Subspace clustering for high dimensional data: a review , 2004, SKDD.

[12]  Jiong Yang,et al.  Integrating Heterogeneous Microarray Data Sources Using Correlation Signatures , 2005, DILS.

[13]  J. Welsh,et al.  Analysis of gene expression identifies candidate markers and pharmacological targets in prostate cancer. , 2001, Cancer research.

[14]  E. Latulippe,et al.  Comprehensive gene expression analysis of prostate cancer reveals distinct transcriptional programs associated with metastatic disease. , 2002, Cancer research.

[15]  Naomi Altman,et al.  Replication, Variation and Normalisation in Microarray Experiments , 2005, Applied bioinformatics.

[16]  Iqbal Gondal,et al.  Collateral missing value imputation: a new robust missing value estimation algorithm for microarray data , 2005, Bioinform..

[17]  R. Wooster,et al.  Gene expression microarray analysis in cancer biology, pharmacology, and drug development: progress and potential. , 2001, Biochemical pharmacology.

[18]  T. H. Bø,et al.  LSimpute: accurate estimation of missing values in microarray data with least squares methods. , 2004, Nucleic acids research.

[19]  Tero Aittokallio,et al.  Improving missing value estimation in microarray data with gene ontology , 2006, Bioinform..

[20]  Ron Kohavi,et al.  Supervised and Unsupervised Discretization of Continuous Features , 1995, ICML.

[21]  Ramakrishnan Srikant,et al.  Mining quantitative association rules in large relational tables , 1996, SIGMOD '96.

[22]  E. Lander,et al.  Gene expression correlates of clinical prostate cancer behavior. , 2002, Cancer cell.

[23]  Huan Liu,et al.  Feature selection for clustering - a filter solution , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..