Systematic analysis of DNA microarray data: ordering and interpreting patterns of gene expression.

biological order is the hierarchical pattern in the data that tracks the lineage splitting and divergence represented by a dendrogram or tree. In contrast, systematic treatment of microarray data assumes that order intrinsic to gene expression profiles will yield insights into molecular, cellular, and tissue level processes and functions. This approach, in turn, might allow for improved disease classification, diagnosis, prognosis, and drug design, among other pharmaceutical and medical goals. The assumptions that are appropriate for any analytical method are determined by the type of biological order that the method seeks to recover. Although gene expression data are similar to other types of data collected for traditional systematic studies (e.g., DNA sequence data, morphological data), it is not immediately obvious how techniques initially designed to elucidate relationships between organisms should be applied to gene expression profiles. There is a longstanding philosophical debate contrasting similaritybased and character-based methods in the analysis of problems in evolutionary biology and organismal classification. Because the most widely used methods in microarray studies are based upon some measurement of overall similarity of genes or cells or tissue types, it may be informative to revisit this debate as it applies to microarray studies. Although both overall similarity and characterbased techniques can produce trees, or branching diagrams, the fundamental assumptions and interpretations of the outcomes differ significantly. The choice between the two depends, therefore, on what the researcher is asking, the nature of the data being collected, and the biological context of the study.