Mining Gene Expression Database for Primary Human Disease Tissues

Studies of gene expression in primary human disease tissue often span several years in order to achieve reasonably large sample sizes and to collect patient clinical information making this data particularly valuable. Due to the lack of a central repository, this data has only been available through disparate and non-publicly accessible sources following publication. We developed disease-to-gene expression mapper (D-GEM) as a publically accessible database and data mining toolbox for microarray data of human primary disease tissue. A statistical pipeline has also been implemented to identify genes over-expressed in disease tissue samples in comparison with normal control samples, or genes whose expression values are associated with clinical parameters such as patient survival rate. One potential application of this data is the identification of pathway specific cancer prognosis markers. By applying a novel, gene signatures for cancer prognosis in the context of known biological pathways in cancer development were identified and confirmed.

[1]  E. Kaplan,et al.  Nonparametric Estimation from Incomplete Observations , 1958 .

[2]  Douglas G Altman,et al.  The logrank test , 2004, BMJ : British Medical Journal.