Identifying tissue-enriched gene expression in mouse tissues using the NIH UniGene database.

There is considerable interest in the gene expression profiles that underpin the phenotypes of cells and tissues. We have developed Bioperl scripts for mining the National Institutes of Health (NIH) UniGene databases to identify this tissue-enriched gene expression. UniGene imports expressed sequence tags (ESTs) from the NIH dbEST database and clusters them by searching for sequence matches. In principle, each UniGene cluster represents the product(s) of a single transcriptional unit in the genome. This transcriptional unit can be expressed in a range of cell types, and UniGene clusters reflect these heterogeneous origins. UniGene clusters containing ESTs expressed predominantly or uniquely by one tissue will show a high proportion of ESTs from that tissue. Our Bioperl scripts parse the NIH UniGene data files as a starting point for an in-house UniGene database. Each UniGene cluster is then assessed for the total number of ESTs from a specified set of dbEST libraries and the total number of ESTs in the cluster. The ratio of the two gives a measure of enrichment. In this paper, we identify tissue-enriched gene expression in mouse pancreas, mammary gland and heart. Each tissue-enriched expression profile identifies genes that are recognisably characteristic of the respective tissue. It also identifies significant numbers of tissue-enhanced UniGenes that are derived from transcriptional units with no known function. These genes may play important and specialised functions in the tissue in question and offer targets for drug action.