Comparison of Visualization Methods of Genome-wide SNP Profiles in Childhood Acute Lymphoblastic Leukaemia

Data mining and knowledge discovery have been applied to datasets in various industries including biomedical data. Modelling, data mining and visualization in biomedical data address the problem of extracting knowledge from large and complex biomedical data. The current challenge of dealing with such data is to develop statistical-based and data mining methods that search and browse the underlying patterns within the data. In this paper, we employ several data reduction methods for visualizing genome--wide Single Nucleotide Polymorphism (SNP) datasets based on state--of--art data reduction techniques. Visualization approach has been selected based on the trustworthiness of the resultant visualizations. To deal with large amounts of genetic variation data, we have chosen to apply different data reduction methods to deal with the problem induced by high dimensionality. Based on the trustworthiness metric we found that neighbour Retrieval Visualizer (NeRV) outperformed other methods. This method optimizes the retrieval quality of Stochastic neighbour Embedding. The quality measure of the visualization (i.e. NeRV) showed excellent results, even though the dataset was reduced from 13917 to 2 dimensions. The visualization results will assist clinicians and biomedical researchers in understanding the systems biology of patients and how to compare different groups of clusters in visualizations.

[1]  P. Bertone,et al.  Integrative data mining: the new direction in bioinformatics , 2001, IEEE Engineering in Medicine and Biology Magazine.

[2]  G. Dahlberg,et al.  Genetics of human populations. , 1948, Advances in genetics.

[3]  N. Davey,et al.  Dimensionality reduction of face images for gender classification , 2004, 2004 2nd International IEEE Conference on 'Intelligent Systems'. Proceedings (IEEE Cat. No.04EX791).

[4]  J. Kruskal Multidimensional scaling by optimizing goodness of fit to a nonmetric hypothesis , 1964 .

[5]  Yousef Saad,et al.  Orthogonal Neighborhood Preserving Projections: A Projection-Based Dimensionality Reduction Technique , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[6]  C. Carlson,et al.  Mapping complex disease loci in whole-genome association studies , 2004, Nature.

[7]  Christian Pilarsky,et al.  High-resolution analysis of chromosomal imbalances using the Affymetrix 10K SNP genotyping chip. , 2005, Genomics.

[8]  Jarkko Venna,et al.  Nonlinear Dimensionality Reduction as Information Retrieval , 2007, AISTATS.

[9]  Deborah A. Nickerson,et al.  Additional SNPs and linkage-disequilibrium analyses are necessary for whole-genome association studies in humans , 2003, Nature Genetics.

[10]  J. Gower Some distance properties of latent root and vector methods used in multivariate analysis , 1966 .

[11]  David Roder,et al.  Cancer in New South Wales Incidence and Mortality 2003 , 2004 .

[12]  M. Daly,et al.  Genome-wide association studies for common diseases and complex traits , 2005, Nature Reviews Genetics.

[13]  D Bentley,et al.  Highly parallel SNP genotyping. , 2003, Cold Spring Harbor symposia on quantitative biology.

[14]  A. Georgopoulos,et al.  Functional magnetic resonance imaging of mental rotation and memory scanning: a multidimensional scaling analysis of brain activation patterns 1 Published on the World Wide Web on 24 February 1998. 1 , 1998, Brain Research Reviews.

[15]  S. Chanock,et al.  SNPs in cancer research and treatment , 2004, British Journal of Cancer.

[16]  Mikhail Belkin,et al.  Laplacian Eigenmaps and Spectral Techniques for Embedding and Clustering , 2001, NIPS.

[17]  Eric O. Postma,et al.  Dimensionality Reduction: A Comparative Review , 2008 .

[18]  Jarkko Venna,et al.  Local multidimensional scaling , 2006, Neural Networks.

[19]  S. Gabriel,et al.  Quality and completeness of SNP databases , 2003, Nature Genetics.

[20]  Yousef Saad,et al.  Orthogonal neighborhood preserving projections , 2005, Fifth IEEE International Conference on Data Mining (ICDM'05).

[21]  Bernhard Schölkopf,et al.  Advances in Neural Information Processing Systems 16: Proceedings of the 2003 Conference , 2004, NIPS 2004.

[22]  Xiaofei He,et al.  Locality Preserving Projections , 2003, NIPS.

[23]  Michel Verleysen,et al.  Nonlinear projection with curvilinear distances: Isomap versus curvilinear distance analysis , 2004, Neurocomputing.

[24]  D. Reich,et al.  Principal components analysis corrects for stratification in genome-wide association studies , 2006, Nature Genetics.

[25]  Yuxiao Hu,et al.  Face recognition using Laplacianfaces , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[26]  Wing Hung Wong,et al.  Comparative linkage analysis and visualization of high-density oligonucleotide SNP array data , 2005, BMC Genetics.

[27]  Jitendra Malik,et al.  Normalized Cuts and Image Segmentation , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[28]  Vipin Kumar,et al.  Introduction to Data Mining , 2022, Data Mining and Machine Learning Applications.

[29]  Peng Zhang,et al.  Nonlinear Dimensionality Reduction by Locally Linear Inlaying , 2009, IEEE Transactions on Neural Networks.

[30]  N Risch,et al.  The Future of Genetic Studies of Complex Human Diseases , 1996, Science.

[31]  Joaquín Dopazo,et al.  Data Analysis and Visualization in Genomics and Proteomics , 2005 .

[32]  Dana C Crawford,et al.  Definition and clinical importance of haplotypes. , 2005, Annual review of medicine.

[33]  Vladimir Makarov,et al.  Two methods of whole-genome amplification enable accurate genotyping across a 2320-SNP linkage panel. , 2004, Genome research.

[34]  A. Hall,et al.  Loss of heterozygosity in childhood acute lymphoblastic leukemia detected by genome-wide microarray single nucleotide polymorphism analysis. , 2005, Cancer research.

[35]  Vipin Kumar,et al.  Introduction to Data Mining, (First Edition) , 2005 .

[36]  Jarkko Venna,et al.  Trustworthiness and metrics in visualizing similarity of gene expression , 2003, BMC Bioinformatics.

[37]  Jarkko Venna,et al.  Neighborhood Preservation in Nonlinear Projection Methods: An Experimental Study , 2001, ICANN.

[38]  Richard Aplenc,et al.  Pharmacogenetic determinants of outcome in acute lymphoblastic leukaemia , 2004, British journal of haematology.

[39]  Jeanny Hérault,et al.  Curvilinear component analysis: a self-organizing neural network for nonlinear mapping of data sets , 1997, IEEE Trans. Neural Networks.

[40]  Jarkko Venna,et al.  Comparison of Visualization Methods for an Atlas of Gene Expression Data Sets , 2007, Inf. Vis..

[41]  M. Olivier A haplotype map of the human genome , 2003, Nature.

[42]  Mikhail Belkin,et al.  Semi-Supervised Learning on Riemannian Manifolds , 2004, Machine Learning.

[43]  Marcel Worring,et al.  Optimizing similarity based visualization in content based image retrieval , 2004, 2004 IEEE International Conference on Multimedia and Expo (ICME) (IEEE Cat. No.04TH8763).

[44]  Michael I. Jordan,et al.  On Spectral Clustering: Analysis and an algorithm , 2001, NIPS.

[45]  Mathura S Venkatarajan,et al.  New quantitative descriptors of amino acids based on multidimensional scaling of a large number of physical–chemical properties , 2001 .

[46]  M. Olivier A haplotype map of the human genome. , 2003, Nature.

[47]  Geoffrey E. Hinton,et al.  Improving dimensionality reduction with spectral gradient descent , 2005, Neural Networks.

[48]  J. G. Donnelly,et al.  Pharmacogenetics in Cancer Chemotherapy: Balancing Toxicity and Response , 2004, Therapeutic drug monitoring.

[49]  Nicolas Le Roux,et al.  Out-of-Sample Extensions for LLE, Isomap, MDS, Eigenmaps, and Spectral Clustering , 2003, NIPS.

[50]  S T Roweis,et al.  Nonlinear dimensionality reduction by locally linear embedding. , 2000, Science.

[51]  H. Hotelling Analysis of a complex of statistical variables into principal components. , 1933 .

[52]  Geoffrey E. Hinton,et al.  Stochastic Neighbor Embedding , 2002, NIPS.

[53]  W. Torgerson Multidimensional scaling: I. Theory and method , 1952 .

[54]  R. Myers,et al.  Candidate-gene approaches for studying complex genetic traits: practical considerations , 2002, Nature Reviews Genetics.

[55]  Alfred O. Hero,et al.  Classification constrained dimensionality reduction , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..