An intuitive graphical visualization technique for the interrogation of transcriptome data

The complexity of gene expression data generated from microarrays and high-throughput sequencing make their analysis challenging. One goal of these analyses is to define sets of co-regulated genes and identify patterns of gene expression. To date, however, there is a lack of easily implemented methods that allow an investigator to visualize and interact with the data in an intuitive and flexible manner. Here, we show that combining a nonlinear dimensionality reduction method, t-statistic Stochastic Neighbor Embedding (t-SNE), with a novel visualization technique provides a graphical mapping that allows the intuitive investigation of transcriptome data. This approach performs better than commonly used methods, offering insight into underlying patterns of gene expression at both global and local scales and identifying clusters of similarly expressed genes. A freely available MATLAB-implemented graphical user interface to perform t-SNE and nearest neighbour plots on genomic data sets is available at www.nimr.mrc.ac.uk/research/james-briscoe/visgenex.

[1]  S. Brody,et al.  Foxj1 regulates floor plate cilia architecture and modifies the response of cells to sonic hedgehog signalling , 2010, Development.

[2]  J. Mesirov,et al.  Interpreting patterns of gene expression with self-organizing maps: methods and application to hematopoietic differentiation. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[3]  M. Kimmel,et al.  Conflict of interest statement. None declared. , 2010 .

[4]  H. Hotelling Analysis of a complex of statistical variables into principal components. , 1933 .

[5]  G. Mardon,et al.  Genome-wide identification of direct targets of the Drosophila retinal determination protein Eyeless. , 2006, Genome research.

[6]  Guido Sanguinetti,et al.  Dimensionality Reduction of Clustered Data Sets , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[7]  Jarkko Venna,et al.  Local multidimensional scaling , 2006, Neural Networks.

[8]  K. Aldape,et al.  A model of molecular interactions on short oligonucleotide microarrays , 2003, Nature Biotechnology.

[9]  John D. Storey,et al.  Significance analysis of time course microarray experiments. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[10]  Vincent J Carey,et al.  Bioconductor: an open source framework for bioinformatics and computational biology. , 2006, Methods in enzymology.

[11]  Gordon K. Smyth,et al.  Use of within-array replicate spots for assessing differential expression in microarray experiments , 2005, Bioinform..

[12]  Seungjin Choi,et al.  Fast stochastic neighbor embedding: a trust-region algorithm , 2004, 2004 IEEE International Joint Conference on Neural Networks (IEEE Cat. No.04CH37541).

[13]  E. Deneris,et al.  Distinct Transcriptomes Define Rostral and Caudal Serotonin Neurons , 2010, The Journal of Neuroscience.

[14]  W. Liang,et al.  9) TM4 Microarray Software Suite , 2006 .

[15]  Matthew A. Hibbs,et al.  Visualization of omics data for systems biology , 2010, Nature Methods.

[16]  W. Liang,et al.  TM4 microarray software suite. , 2006, Methods in enzymology.

[17]  A I Saeed,et al.  TM4: a free, open-source system for microarray data management and analysis. , 2003, BioTechniques.

[18]  Michel Verleysen,et al.  Scale-independent quality criteria for dimensionality reduction , 2010, Pattern Recognit. Lett..

[19]  Geoffrey E. Hinton,et al.  Stochastic Neighbor Embedding , 2002, NIPS.

[20]  Stat Pairs,et al.  Statistical Algorithms Description Document , 2022 .

[21]  Richard O. Duda,et al.  Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[22]  R. Tibshirani,et al.  Significance analysis of microarrays applied to the ionizing radiation response , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[23]  Ying Jin,et al.  Transcriptome analysis of early organogenesis in human embryos. , 2010, Developmental cell.

[24]  A. Kudlicki,et al.  Logic of the Yeast Metabolic Cycle: Temporal Compartmentalization of Cellular Processes , 2005, Science.

[25]  G. Church,et al.  Systematic determination of genetic network architecture , 1999, Nature Genetics.

[26]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[27]  D. Botstein,et al.  Cluster analysis and display of genome-wide expression patterns. , 1998, Proceedings of the National Academy of Sciences of the United States of America.