Exploiting inter-gene information for microarray data integration

Microarray data integration is an important yet challenging problem. Usually, direct integration of microarrays after normalization is ineffective because of the diverse types of experiment specific variations. To address this issue, two novel integration approaches were proposed in recent microarray studies. The first study[16] presented a cancer classification technique which identifies gene pairs whose expression orders are consistent within class and different across classes. The other study[18] presented a promising gene expression analysis technique which utilizes pairwise correlations of gene expressions across different microarray datasets. Interestingly, we observe that both of the independently developed techniques rely on inter-gene information and noise filtering strategy to achieve satisfactory performance in microarray integration. Motivated by this observation, we propose in this paper a formal data model for microarray integration using inter-gene information and effective filtering, which generalizes the previous two frameworks. We also show how the proposed model can handle a broader range of problems than the previous frameworks.

[1]  D. Botstein,et al.  Cluster analysis and display of genome-wide expression patterns. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[2]  Daniel Q. Naiman,et al.  Classifying Gene Expression Profiles from Pairwise mRNA Comparisons , 2004, Statistical applications in genetics and molecular biology.

[3]  George C Tseng,et al.  Tight Clustering: A Resampling‐Based Approach for Identifying Stable and Tight Patterns in Data , 2005, Biometrics.

[4]  W. Wong,et al.  Transitive functional annotation by shortest-path analysis of gene expression data , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[5]  W. Wong,et al.  Functional annotation and network reconstruction through cross-platform integration of microarray data , 2005, Nature Biotechnology.

[6]  R. Bast,et al.  Three Biomarkers Identified from Serum Proteomic Analysis for the Detection of Early Stage Ovarian Cancer , 2004, Cancer Research.

[7]  Hans-Peter Kriegel,et al.  Protein function prediction via graph kernels , 2005, ISMB.

[8]  K. Pienta,et al.  Tissue Microarray Sampling Strategy for Prostate Cancer Biomarker Analysis , 2002, The American journal of surgical pathology.

[9]  Alexander J. Hartemink,et al.  Informative Structure Priors: Joint Learning of Dynamic Regulatory Networks from Multiple Types of Data , 2004, Pacific Symposium on Biocomputing.

[10]  Daniel Q. Naiman,et al.  Robust prostate cancer marker genes emerge from direct integration of inter-study microarray data , 2005, Bioinform..

[11]  J. Glimm,et al.  Detection of cancer-specific markers amid massive mass spectral data , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[12]  H. Mewes,et al.  Functional modules by relating protein interaction networks and gene expression. , 2003, Nucleic acids research.

[13]  J. Tenenbaum,et al.  A global geometric framework for nonlinear dimensionality reduction. , 2000, Science.

[14]  P. Rousseeuw Silhouettes: a graphical aid to the interpretation and validation of cluster analysis , 1987 .

[15]  M. Gerstein,et al.  Relating whole-genome expression data with protein-protein interactions. , 2002, Genome research.

[16]  L. Penland,et al.  Use of a cDNA microarray to analyse gene expression patterns in human cancer , 1996, Nature Genetics.

[17]  R. Tibshirani,et al.  Diagnosis of multiple cancer types by shrunken centroids of gene expression , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[18]  Laurie J. Heyer,et al.  Exploring expression data: identification and analysis of coexpressed genes. , 1999, Genome research.

[19]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[20]  Nello Cristianini,et al.  Kernel-Based Data Fusion and Its Application to Protein Function Prediction in Yeast , 2003, Pacific Symposium on Biocomputing.