Multidimensional Scaling for Genomic Data

Scientists working with genomic data face challenges to analyze and understand an ever-increasing amount of data. Multidimensional scaling (MDS) refers to the representation of high dimensional data in a low dimensional space that preserves the similarities between data points. Metric MDS algorithms aim to embed inter-point distances as close as the input dissimilarities. The computational complexity of most metric MDS methods is over O(n2), which restricts application to large genomic data (n ≫ 106). The application of non-metric MDS might be considered, in which inter-point distances are embedded considering only the relative order of the input dissimilarities. A non-metric MDS method has lower complexity compared to a metric MDS, although it does not preserve the true relationships. However, if the input dissimilarities are unreliable, too difficult to measure or simply unavailable, a non-metric MDS is the appropriate algorithm. In this paper, we give overview of both metric and non-metric MDS methods and their application to genomic data analyses.

[1]  Johannes Goll,et al.  Bioinformatics Applications Note Database and Ontologies Metarep: Jcvi Metagenomics Reports—an Open Source Tool for High-performance Comparative Metagenomics , 2022 .

[2]  Patrick J Wolfe Making sense of big data , 2013, Proceedings of the National Academy of Sciences.

[3]  Antanas Zilinskas,et al.  Parallel genetic algorithm: assessment of performance in multidimensional scaling , 2007, GECCO '07.

[4]  Matthew Chalmers,et al.  Fast Multidimensional Scaling Through Sampling, Springs and Interpolation , 2003, Inf. Vis..

[5]  Geoffrey C. Fox,et al.  Interpolative multidimensional scaling techniques for the identification of clusters in very large sequence sets , 2012, BMC Bioinformatics.

[6]  Antanas Zilinskas,et al.  Two level minimization in multidimensional scaling , 2007, J. Glob. Optim..

[7]  Geoffrey C. Fox,et al.  DACIDR: deterministic annealed clustering with interpolative dimension reduction using a large collection of 16S rRNA sequences , 2012, BCB '12.

[8]  Mikko Niemi,et al.  Genetics is a major determinant of expression of the human hepatic uptake transporter OATP1B1, but not of OATP1B3 and OATP2B1 , 2013, Genome Medicine.

[9]  K. Lindblad-Toh,et al.  A High Density SNP Array for the Domestic Horse and Extant Perissodactyla: Utility for Association Mapping, Genetic Diversity, and Phylogeny Studies , 2012, PLoS genetics.

[10]  Chengsong Zhu,et al.  Nonmetric Multidimensional Scaling Corrects for Population Structure in Association Mapping With Different Sample Types , 2009, Genetics.

[11]  David S. Wishart,et al.  METAGENassist: a comprehensive web server for comparative metagenomics , 2012, Nucleic Acids Res..

[12]  Geoffrey C. Fox,et al.  Integration of Clustering and Multidimensional Scaling to Determine Phylogenetic Trees as Spherical Phylograms Visualized in 3 Dimensions , 2014, 2014 14th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing.

[13]  Panos M. Pardalos,et al.  Encyclopedia of Optimization , 2006 .

[14]  Y-h. Taguchi,et al.  Relational patterns of gene expression via non-metric multidimensional scaling analysis , 2004, Bioinform..

[15]  Gintautas Dzemyda,et al.  Multidimensional Data Visualization , 2013 .

[16]  Kyu-Baek Hwang,et al.  CFMDS: CUDA-based fast multidimensional scaling for genome-scale data , 2012, BMC Bioinformatics.

[17]  David J. Kriegman,et al.  Generalized Non-metric Multidimensional Scaling , 2007, AISTATS.

[18]  Martin Hartmann,et al.  Introducing mothur: Open-Source, Platform-Independent, Community-Supported Software for Describing and Comparing Microbial Communities , 2009, Applied and Environmental Microbiology.

[19]  V. Marx Biology: The big challenges of big data , 2013, Nature.

[20]  James Taylor,et al.  Next-generation sequencing data interpretation: enhancing reproducibility and accessibility , 2012, Nature Reviews Genetics.

[21]  J. Kruskal Multidimensional scaling by optimizing goodness of fit to a nonmetric hypothesis , 1964 .

[22]  W. Torgerson Multidimensional scaling: I. Theory and method , 1952 .

[23]  Rajanikanth Vadigepalli,et al.  Inputs drive cell phenotype variability , 2014, Genome research.

[24]  Geoffrey C. Fox,et al.  Visualizing the Protein Sequence Universe , 2014, Concurr. Comput. Pract. Exp..

[25]  K. R. Clarke,et al.  Change in marine communities : an approach to statistical analysis and interpretation , 2001 .

[26]  R. Knight,et al.  Advancing analytical algorithms and pipelines for billions of microbial sequences. , 2012, Current opinion in biotechnology.

[27]  Tom Kamphans,et al.  Estimating exome genotyping accuracy by comparing to data from large scale sequencing projects , 2013, Genome Medicine.

[28]  C. Staley,et al.  Application of Illumina next‐generation sequencing to characterize the bacterial community of the Upper Mississippi River , 2013, Journal of applied microbiology.

[29]  Annick Lesne,et al.  Improving the efficiency of multidimensional scaling in the analysis of high-dimensional data using singular value decomposition , 2011, Bioinform..

[30]  Jengnan Tzeng,et al.  Multidimensional scaling for large genomic data sets , 2008, BMC Bioinformatics.

[31]  Patrick D. Schloss,et al.  Reducing the Effects of PCR Amplification and Sequencing Artifacts on 16S rRNA-Based Studies , 2011, PloS one.

[32]  Yong Wang,et al.  bammds: a tool for assessing the ancestry of low-depth whole-genome data using multidimensional scaling (MDS) , 2014, Bioinform..

[33]  Manuel A. R. Ferreira,et al.  PLINK: a tool set for whole-genome association and population-based linkage analyses. , 2007, American journal of human genetics.

[34]  Patrick J. F. Groenen,et al.  Modern Multidimensional Scaling: Theory and Applications , 2003 .

[35]  M. Metzker Sequencing technologies — the next generation , 2010, Nature Reviews Genetics.

[36]  Guoliang Xue,et al.  Global Minimization of Nonconvex Energy Functions: Molecular Conformation and Protein Folding, Proceedings of a DIMACS Workshop, USA, March 20-21, 1995 , 1995, Global Minimization of Nonconvex Energy Functions: Molecular Conformation and Protein Folding.