A Bioinformatic Approach to the Identification of Candidate Genes for the Development of New Cancer Diagnostics

Abstract A multivariate analysis of the National Cancer Institute gene expression database is reported here. The soft independent modelling of a class analogy approach achieved cell line classification according to histological origin. With the PCA method, based on the expression of 9605 genes and ESTs, classification of colon, leukaemia, renal, melanoma and CNS cells could be performed, but not of lung, breast and ovarian cells. Another multivariate procedure, called partial least squares discriminant analysis (PLS-DA), provides bioinformatic clues for the selection of a limited number of gene transcripts most effective in discriminating different tumoral histotypes. Among them it is possible to identify candidates in the development of new diagnostic tests for cancer detection and unknown genes deserving high priority in further studies. In particular, melan-A, acid phosphatase 5, dopachrome tautomerase, S100-β and acid ceramidase were found to be among the most important genes for melanoma. The potential of the present bioinformatic approach is exemplified by its ability to identify differentiation and diagnostic markers already in use in clinical settings, such as protein S-100, a prognostic parameter in patients with metastatic melanoma and a screening marker for melanoma metastasis.

[1]  S. Wold,et al.  SIMCA: A Method for Analyzing Chemical Data in Terms of Similarity and Analogy , 1977 .

[2]  S. Wold Cross-Validatory Estimation of the Number of Components in Factor and Principal Components Models , 1978 .

[3]  S. A. bano C. D. nn W. I. i Wold,et al.  Pattern recognition: finding and using regularities in multivariate data Food research, how to relate sets of measurements or observations to each other , 1983 .

[4]  S. Rosenberg,et al.  Cloning of the gene coding for a shared human melanoma antigen recognized by autologous T cells infiltrating into tumor. , 1994, Proceedings of the National Academy of Sciences of the United States of America.

[5]  I. Jackson,et al.  Molecular characterization of a human tyrosinase-related-protein-2 cDNA. Patterns of expression in melanocytic cells. , 1994, European journal of biochemistry.

[6]  J. Hoyer,et al.  Immunohistochemical demonstration of acid phosphatase isoenzyme 5 (tartrate-resistant) in paraffin sections of hairy cell leukemia and other hematologic disorders. , 1997, American journal of clinical pathology.

[7]  D. Bostwick,et al.  Human acid ceramidase is overexpressed but not mutated in prostate cancer , 2000, Genes, chromosomes & cancer.

[8]  A. Blum,et al.  Significance of serum protein S100 levels in screening for melanoma metastasis: does protein S100 enable early detection of melanoma recurrence? , 2000, Melanoma research.

[9]  Christian A. Rees,et al.  Systematic variation in gene expression patterns in human cancer cell lines , 2000, Nature Genetics.

[10]  D. Botstein,et al.  A gene expression database for the molecular pharmacology of cancer , 2000, Nature Genetics.

[11]  D. Condorelli,et al.  Shortcuts in genome-scale cancer pharmacology research from multivariate analysis of the National Cancer Institute gene expression database. , 2001, Biochemical pharmacology.

[12]  Giuseppe Musumarra,et al.  A multivariate insight into the in vitro antitumour screen database of the National Cancer Institute: classification of compounds, similarities among cell lines and the influence of molecular targets , 2001, J. Comput. Aided Mol. Des..

[13]  I. Jackson Molecular characterization of a human tyrosinase-related-protein-2 cDNA , 2004 .

[14]  G. Orchard Comparison of Immunohistochemical Labelling of Melanocyte differentiation Antibodies Melan-A, Tyrosinase and HMB 45 with NKIC3 and S100 Protein in the Evaluation of Benign Naevi and Malignant Melanoma , 2000, The Histochemical Journal.