Constrained clustering of gene expression profiles

In this paper a querying environment for analysis of patient clinical data is presented. The data consists of two parts: patients’ pathological data and data about corresponding gene expression levels. The querying environment includes a generic algorithm for constructing decision trees, as well as algorithms for discretizing gene expression levels and for searching frequent patterns (itemsets). The algorithms are accessed by means of a query language. The language can be used to simulate various data mining algorithms, such as the one developed by Morishita et al. for Itemset Constrained Clustering.