Data Mining the NCI Cancer Cell Line Compound GI50 Values: Identifying Quinone Subtypes Effective Against Melanoma and Leukemia Cell Classes

Using data mining techniques, we have studied a subset (1400) of compounds from the large public National Cancer Institute (NCI) compounds data repository. We first carried out a functional class identity assignment for the 60 NCI cancer testing cell lines via hierarchical clustering of gene expression data. Comprised of nine clinical tissue types, the 60 cell lines were placed into six classes-melanoma, leukemia, renal, lung, and colorectal, and the sixth class was comprised of mixed tissue cell lines not found in any of the other five classes. We then carried out supervised machine learning, using the GI(50) values tested on a panel of 60 NCI cancer cell lines. For separate 3-class and 2-class problem clustering, we successfully carried out clear cell line class separation at high stringency, p < 0.01 (Bonferroni corrected t-statistic), using feature reduction clustering algorithms embedded in RadViz, an integrated high dimensional analytic and visualization tool. We started with the 1400 compound GI(50) values as input and selected only those compounds, or features, significant in carrying out the classification. With this approach, we identified two small sets of compounds that were most effective in carrying out complete class separation of the melanoma, non-melanoma classes and leukemia, non-leukemia classes. To validate these results, we showed that these two compound sets' GI(50) values were highly accurate classifiers using five standard analytical algorithms. One compound set was most effective against the melanoma class cell lines (14 compounds), and the other set was most effective against the leukemia class cell lines (30 compounds). The two compound classes were both significantly enriched in two different types of substituted p-quinones. The melanoma cell line class of 14 compounds was comprised of 11 compounds that were internal substituted p-quinones, and the leukemia cell line class of 30 compounds was comprised of 6 compounds that were external substituted p-quinones. Attempts to subclassify melanoma or leukemia cell lines based upon their clinical cancer subtype met with limited success. For example, using GI(50) values for the 30 compounds we identified as effective against all leukemia cell lines, we could subclassify acute lymphoblastic leukemia (ALL) origin cell lines from non-ALL leukemia origin cell lines without significant overlap from non-leukemia cell lines. Based upon clustering using GI(50) values for the 60 cancer cell lines laid out by the RadViz algorithm, these two compound subsets did not overlap with clusters containing any of the NCI's 92 compounds of known mechanism of action, a few of which are quinones. Given their structural patterns, the two p-quinone subtypes we identified would clearly be expected to possess different redox potentials/substrate specificities for enzymatic reduction in vivo. These two p-quinone subtypes represent valuable information that may be used in the elucidation of pharmacophores for the design of compounds to treat these two cancer tissue types in the clinic.

[1]  Christian A. Rees,et al.  Systematic variation in gene expression patterns in human cancer cell lines , 2000, Nature Genetics.

[2]  T. Naoe,et al.  Analysis of genetic polymorphism in NQO1, GST-M1, GST-T1, and CYP3A4 in 469 Japanese patients with therapy-related leukemia/ myelodysplastic syndrome and de novo acute myeloid leukemia. , 2000, Clinical cancer research : an official journal of the American Association for Cancer Research.

[3]  B. Deurs,et al.  Rab7: a key to lysosome biogenesis. , 2000, Molecular biology of the cell.

[4]  J. Mesirov,et al.  Chemosensitivity prediction by transcriptional profiling , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[5]  J. Weinstein,et al.  Pharmacogenomic analysis: correlating molecular substructure classes with microarray gene expression data , 2002, The Pharmacogenomics Journal.

[6]  Jae K. Lee,et al.  Mining and Visualizing Large Anticancer Drug Discovery Databases , 2000, J. Chem. Inf. Comput. Sci..

[7]  G. Morgan,et al.  Low NAD(P)H:quinone oxidoreductase 1 activity is associated with increased risk of acute leukemia in adults. , 2001, Blood.

[8]  G. S. Johnson,et al.  An Information-Intensive Approach to the Molecular Pharmacology of Cancer , 1997, Science.

[9]  D. Ross,et al.  Bioactivation of quinones by DT-diaphorase, molecular, biochemical, and chemical studies. , 1994, Oncology research.

[10]  Structures of recombinant human and mouse NAD(P)H:quinone oxidoreductases: species comparison and structural changes with substrate binding and release. , 2000 .

[11]  Y. Pommier,et al.  Eukaryotic DNA topoisomerases mediated DNA cleavage induced by a new inhibitor: NSC 665517. , 1995, Molecular pharmacology.

[12]  A. Monks,et al.  Site of action of two novel pyrimidine biosynthesis inhibitors accurately predicted by the compare program. , 1995, Biochemical pharmacology.

[13]  L. Amzel,et al.  Structure-based development of anticancer drugs: complexes of NAD(P)H:quinone oxidoreductase 1 with chemotherapeutic quinones. , 2001, Structure.

[14]  K D Paull,et al.  Halichondrin B and homohalichondrin B, marine natural products binding in the vinca domain of tubulin. Discovery of tubulin-based mechanism of action by analysis of differential cytotoxicity data. , 1991, The Journal of biological chemistry.

[15]  J N Weinstein,et al.  Mining the National Cancer Institute Anticancer Drug Discovery Database: cluster analysis of ellipticine analogs with p53-inverse and central nervous system-selective patterns of activity. , 1998, Molecular pharmacology.

[16]  R Brüggemann,et al.  Toxicology databases in the metadatabank of online databases. , 1995, Toxicology.