Multidimensional scaling with discrimination coefficients for supervised visualization of high-dimensional data

Visualization techniques for high-dimensional data sets play a pivotal role in exploratory analysis in a wide range of disciplines. A particularly challenging problem represents gene expression data based on microarray technology where the number of features (genes) typically exceeds 20,000, whereas the number of samples is frequently below 200. We investigated class-specific discrimination coefficients for each feature and each pair of classes for an effective nonlinear mapping to lower-dimensional space. We applied the technique to three microarray data sets and compared the projections to two-dimensional space with the results from a conventional multidimensional scaling method, a score plot resulting from principal component analysis, and projections from linear discriminant analysis. In the experiments, we observed that the discrimination coefficients allowed for an improved visualization of high-dimensional genomic data.

[1]  J. Mesirov,et al.  Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. , 1999, Science.

[2]  M. J. van de Vijver,et al.  Gene expression profiling in breast cancer: understanding the molecular basis of histologic grade to improve prognosis. , 2006, Journal of the National Cancer Institute.

[3]  Yudong D. He,et al.  Gene expression profiling predicts clinical outcome of breast cancer , 2002, Nature.

[4]  N. Mantel Evaluation of survival data and two new rank order statistics arising in its consideration. , 1966, Cancer chemotherapy reports.

[5]  Werner Dubitzky,et al.  A Practical Approach to Microarray Data Analysis , 2003, Springer US.

[6]  John W. Sammon,et al.  A Nonlinear Mapping for Data Structure Analysis , 1969, IEEE Transactions on Computers.

[7]  R. Fisher On the Interpretation of χ 2 from Contingency Tables , and the Calculation of P Author , 2022 .

[8]  W. Kruskal,et al.  Use of Ranks in One-Criterion Variance Analysis , 1952 .

[9]  Maurice P H M Jansen,et al.  Molecular classification of tamoxifen-resistant breast carcinomas by gene expression profiling. , 2005, Journal of clinical oncology : official journal of the American Society of Clinical Oncology.

[10]  Luis Mateus Rocha,et al.  Singular value decomposition and principal component analysis , 2003 .

[11]  J. Reis-Filho,et al.  The role of molecular analysis in breast cancer , 2009, Pathology.

[12]  Its'hak Dinstein,et al.  On pattern classification with Sammon's nonlinear mapping an experimental study , 1998, Pattern Recognit..

[13]  J. Foekens,et al.  Gene-expression profiles to predict distant metastasis of lymph-node-negative primary breast cancer , 2005, The Lancet.

[14]  Trevor Hastie,et al.  The Elements of Statistical Learning , 2001 .

[15]  R. Fisher On the Interpretation of χ2 from Contingency Tables, and the Calculation of P , 2018, Journal of the Royal Statistical Society Series A (Statistics in Society).

[16]  William N. Venables,et al.  Modern Applied Statistics with S , 2010 .

[17]  Philippe Besse,et al.  Selection of Biologically Relevant Genes with a Wrapper Stochastic Algorithm , 2007, Statistical applications in genetics and molecular biology.

[18]  Howard Y. Chang,et al.  Robustness, scalability, and integration of a wound-response gene expression signature in predicting breast cancer survival. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[19]  Brian D. Ripley,et al.  Modern Applied Statistics with S Fourth edition , 2002 .

[20]  Richard Simon,et al.  Supervised analysis when the number of candidate features (p) greatly exceeds the number of cases (n) , 2003, SKDD.