t-Distributed Stochastic Neighbor Embedding (t-SNE): A tool for eco-physiological transcriptomic analysis.

High-throughput RNA sequencing (RNA-Seq) has transformed the ecophysiological assessment of individual plankton species and communities. However, the technology generates complex data consisting of millions of short-read sequences that can be difficult to analyze and interpret. New bioinformatics workflows are needed to guide experimentation, environmental sampling, and to develop and test hypotheses. One complexity-reducing tool that has been used successfully in other fields is "t-distributed Stochastic Neighbor Embedding" (t-SNE). Its application to transcriptomic data from marine pelagic and benthic systems has yet to be explored. The present study demonstrates an application for evaluating RNA-Seq data using previously published, conventionally analyzed studies on the copepods Calanus finmarchicus and Neocalanus flemingeri. In one application, gene expression profiles were compared among different developmental stages. In another, they were compared among experimental conditions. In a third, they were compared among environmental samples from different locations. The profile categories identified by t-SNE were validated by reference to published results using differential gene expression and Gene Ontology (GO) analyses. The analyses demonstrate how individual samples can be evaluated for differences in global gene expression, as well as differences in expression related to specific biological processes, such as lipid metabolism and responses to stress. As RNA-Seq data from plankton species and communities become more common, t-SNE analysis should provide a powerful tool for determining trends and classifying samples into groups with similar transcriptional physiology, independent of collection site or time.

[1]  Nadine S. J. Lysiak,et al.  Transcriptional Profiling of Metabolic Transitions during Development and Diapause Preparation in the Copepod Calanus finmarchicus. , 2016, Integrative and comparative biology.

[2]  P. Lenz,et al.  Regional heterogeneity impacts gene expression in the subarctic zooplankter Neocalanus flemingeri in the northern Gulf of Alaska. , 2019 .

[3]  A. Christie,et al.  Identification and developmental expression of the enzymes responsible for dopamine, histamine, octopamine and serotonin biosynthesis in the copepod crustacean Calanus finmarchicus. , 2014, General and comparative endocrinology.

[4]  M. Jungbluth,et al.  Glutathione S-Transferase Regulation in Calanus finmarchicus Feeding on the Toxic Dinoflagellate Alexandrium fundyense , 2016, PloS one.

[5]  R. O’Neill,et al.  Transcriptomic profiles of spring and summer populations of the Southern Ocean salp, Salpa thompsoni, in the Western Antarctic Peninsula region , 2017 .

[6]  Martin Wattenberg,et al.  How to Use t-SNE Effectively , 2016 .

[7]  Laurens van der Maaten,et al.  Accelerating t-SNE using tree-based algorithms , 2014, J. Mach. Learn. Res..

[8]  Harriet Alexander,et al.  Metatranscriptome analyses indicate resource partitioning between diatoms in the field , 2015, Proceedings of the National Academy of Sciences.

[9]  E. Goetze,et al.  Vertical gradients in species richness and community composition across the twilight zone in the North Pacific Subtropical Gyre , 2017, Molecular ecology.

[10]  Piet Hut,et al.  A hierarchical O(N log N) force-calculation algorithm , 1986, Nature.

[11]  Samuel T. Wilson,et al.  Functional group-specific traits drive phytoplankton dynamics in the oligotrophic ocean , 2015, Proceedings of the National Academy of Sciences.

[12]  Evan Z. Macosko,et al.  Comprehensive Classification of Retinal Bipolar Neurons by Single-Cell Transcriptomics , 2016, Cell.

[13]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[14]  Effects of naphthalene on gene transcription in Calanus finmarchicus (Crustacea: Copepoda). , 2008, Aquatic toxicology.

[15]  Ying Dai,et al.  Principal component analysis based methods in bioinformatics studies , 2011, Briefings Bioinform..

[16]  M. Hemberg,et al.  Identifying cell populations with scRNASeq. , 2017, Molecular aspects of medicine.

[17]  Hans-Peter Kriegel,et al.  A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise , 1996, KDD.

[18]  P. Lenz,et al.  Transcriptomic responses of the calanoid copepod Calanus finmarchicus to the saxitoxin producing dinoflagellate Alexandrium fundyense , 2016, Scientific Reports.

[19]  J. Tenenbaum,et al.  A global geometric framework for nonlinear dimensionality reduction. , 2000, Science.

[20]  Colin N. Dewey,et al.  De novo transcript sequence reconstruction from RNA-seq using the Trinity platform for reference generation and analysis , 2013, Nature Protocols.

[21]  Patrick J. F. Groenen,et al.  The Past, Present, and Future of Multidimensional Scaling , 2013 .

[22]  Cole Trapnell,et al.  Ultrafast and memory-efficient alignment of short DNA sequences to the human genome , 2009, Genome Biology.

[23]  Marcel J. T. Reinders,et al.  2D Representation of Transcriptomes by t-SNE Exposes Relatedness between Human Tissues , 2016, PloS one.

[24]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[25]  O. Brakstad,et al.  Molecular effects of diethanolamine exposure on Calanus finmarchicus (Crustacea: Copepoda). , 2010, Aquatic toxicology.

[26]  Julia Ling,et al.  Visualization of High Dimensional Turbulence Simulation Data using t-SNE. , 2017 .

[27]  D. Kültz,et al.  Molecular and evolutionary basis of the cellular stress response. , 2005, Annual review of physiology.

[28]  R. P. Hassett,et al.  De Novo Assembly of a Transcriptome for Calanus finmarchicus (Crustacea, Copepoda) – The Dominant Zooplankter of the North Atlantic Ocean , 2014, PloS one.

[29]  W. Torgerson Multidimensional scaling: I. Theory and method , 1952 .

[30]  I. Amit,et al.  Early metazoan cell type diversity and the evolution of multicellular gene regulation , 2018, Nature Ecology & Evolution.

[31]  Jengnan Tzeng,et al.  Multidimensional scaling for large genomic data sets , 2008, BMC Bioinformatics.

[32]  H. Hotelling Analysis of a complex of statistical variables into principal components. , 1933 .

[33]  Michael C. Hout,et al.  Multidimensional Scaling , 2003, Encyclopedic Dictionary of Archaeology.

[34]  A. Tarrant,et al.  Heat shock protein expression during stress and diapause in the marine copepod Calanus finmarchicus. , 2011, Journal of insect physiology.

[35]  J. Dunn Well-Separated Clusters and Optimal Fuzzy Partitions , 1974 .

[36]  Samuel T. Wilson,et al.  Metatranscriptomic and functional metagenomic analysis of methylphosphonate utilization by marine bacteria , 2013, Front. Microbiol..

[37]  David M Schruth,et al.  Comparative metatranscriptomics identifies molecular bases for the physiological responses of phytoplankton to varying iron availability , 2012, Proceedings of the National Academy of Sciences.

[38]  B. Williams,et al.  Mapping and quantifying mammalian transcriptomes by RNA-Seq , 2008, Nature Methods.

[39]  D. Towle,et al.  Genomic approaches to detecting thermal stress in Calanus finmarchicus (Copepoda: Calanoida) , 2004 .

[40]  P. Lenz,et al.  Regional heterogeneity impacts gene expression in the subarctic zooplankter Neocalanus flemingeri in the northern Gulf of Alaska , 2019, Communications Biology.

[41]  D. Hartline,et al.  Complementary mechanisms for neurotoxin resistance in a copepod , 2017, Scientific Reports.

[42]  A. Tarrant,et al.  Transcriptional profiling of reproductive development, lipid storage and molting throughout the last juvenile stage of the marine copepod Calanus finmarchicus , 2014, Frontiers in Zoology.

[43]  A. Christie Expansion of the neuropeptidome of the globally invasive marine crab Carcinus maenas. , 2016, General and comparative endocrinology.

[44]  A. Tarrant,et al.  Differential gene expression in diapausing and active Calanus finmarchicus (Copepoda) , 2008 .