PBC: A Software Framework Facilitating Pattern-Based Clustering for Microarray Data Analysis

Microarray data produces expression pattern of thousands of genes at once. Grouping these gene expression patterns to have each group convey some biologically meaningful sight entails use of a clustering method. Two problems exist when attempting to use conventional clustering methods for the microarray data analysis. Presence of outliers skews the mean value computation which, in turn influences placement of inconsistent gene expression patterns into one group. The clustering algorithms themselves generally cannot determine the right size of the clusters. We present a new method which approaches to the clustering problem from a different angle. That is, the clustering of gene expression patterns is better dealt with within a software framework that is conducive to helping biologists derive the right size of clusters utilizing their understanding of the experimental context once the baseline clusters are computed using the fold changes of gene expression levels. We discuss our experiences of using the framework in analyzing numerous microarray data experiments.

[1]  Lawrence Hunter,et al.  Trajectory Clustering: A Non-Parametric Method for Grouping Gene Expression Time Courses with Applications to Mammary Development , 2002, Pacific Symposium on Biocomputing.

[2]  T. Speed,et al.  GOstat: find statistically overrepresented Gene Ontologies within a group of genes. , 2004, Bioinformatics.

[3]  Luke E. K. Achenie,et al.  Expression Profile of Osteoblast Lineage at Defined Stages of Differentiation* , 2005, Journal of Biological Chemistry.

[4]  Steven C. Lawlor,et al.  GenMAPP, a new tool for viewing and analyzing microarray data on biological pathways , 2002, Nature Genetics.

[5]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[6]  Peter J. Rousseeuw,et al.  Finding Groups in Data: An Introduction to Cluster Analysis , 1990 .

[7]  G. Church,et al.  Systematic determination of genetic network architecture , 1999, Nature Genetics.

[8]  George C. Tseng,et al.  Penalized and weighted K-means for clustering with scattered objects and prior information in high-throughput biological data , 2007, Bioinform..

[9]  M. C. Rudolph,et al.  Functional Development of the Mammary Gland: Use of Expression Profiling and Trajectory Clustering to Reveal Changes in Gene Expression During Pregnancy, Lactation, and Involution , 2003, Journal of Mammary Gland Biology and Neoplasia.

[10]  J. Miernyk,et al.  Shape-to-String Mapping: A Novel Approach To Clustering Time-Index Biomics Data , 2009 .

[11]  P. Sneath,et al.  Numerical Taxonomy , 1962, Nature.

[12]  Anil K. Jain,et al.  Data clustering: a review , 1999, CSUR.

[13]  Ali S. Hadi,et al.  Finding Groups in Data: An Introduction to Chster Analysis , 1991 .