From transcriptomics to bibliomics.

BACKGROUND Current biological investigations tend to operate with genomes, instead of genes as during the last century. It is possible to compare entire genomes, transcriptomes or proteomes, using alphanumeric data corresponding to the differential expression levels of thousands of genes. What remains difficult is to link array results to factual or bibliographical data and retrieve information that is highly structured and - in Shannon's sense - rare. MATERIAL/METHODS We have developed a tool, Documentation and Information LIBrary (DILIB), that enables us to retrieve, organize and analyze huge amounts of data available on the Internet and related to microarray experiments. DILIB can link hundreds of differentially expressed genes - through their Single Identifier or GenBank accession number - to hundreds of Medline records, and can retrieve, analyze, and compare automatically thousands of non-trivial descriptors related to gene clusters. RESULTS As exemplified with frequency comparison of MEdical Subject Headings and Registry Number descriptors, we reanalyzed the involvement of 'integrin', 'interleukin' and 'CD Antigens' in mesotheliomas. Thus, DILIB allowed us to: (i). associate literature to expressed genes, (ii). link functional transcriptomes in various experiments, (iii). associate specific descriptors to experiments, (iv). define new research areas, and eventually (v). find new functions for co-expressed genes. CONCLUSIONS We propose a new concept, 'bibliomics', representing a subset of high quality and rare information, retrieved and organized by systematic literature-searching tools from existing databases, and related to a subset of genes functioning together in '-omic' sciences.