A data analysis framework for biomedical big data: Application on mesoderm differentiation of human pluripotent stem cells

The development of high-throughput biomolecular technologies has resulted in generation of vast omics data at an unprecedented rate. This is transforming biomedical research into a big data discipline, where the main challenges relate to the analysis and interpretation of data into new biological knowledge. The aim of this study was to develop a framework for biomedical big data analytics, and apply it for analyzing transcriptomics time series data from early differentiation of human pluripotent stem cells towards the mesoderm and cardiac lineages. To this end, transcriptome profiling by microarray was performed on differentiating human pluripotent stem cells sampled at eleven consecutive days. The gene expression data was analyzed using the five-stage analysis framework proposed in this study, including data preparation, exploratory data analysis, confirmatory analysis, biological knowledge discovery, and visualization of the results. Clustering analysis revealed several distinct expression profiles during differentiation. Genes with an early transient response were strongly related to embryonic- and mesendoderm development, for example CER1 and NODAL. Pluripotency genes, such as NANOG and SOX2, exhibited substantial downregulation shortly after onset of differentiation. Rapid induction of genes related to metal ion response, cardiac tissue development, and muscle contraction were observed around day five and six. Several transcription factors were identified as potential regulators of these processes, e.g. POU1F1, TCF4 and TBP for muscle contraction genes. Pathway analysis revealed temporal activity of several signaling pathways, for example the inhibition of WNT signaling on day 2 and its reactivation on day 4. This study provides a comprehensive characterization of biological events and key regulators of the early differentiation of human pluripotent stem cells towards the mesoderm and cardiac lineages. The proposed analysis framework can be used to structure data analysis in future research, both in stem cell differentiation, and more generally, in biomedical big data analytics.

[1]  P. Kirchhof,et al.  Universal Cardiac Induction of Human Pluripotent Stem Cells in Two and Three‐Dimensional Formats: Implications for In Vitro Maturation , 2015, Stem cells.

[2]  M. Araúzo-Bravo,et al.  Functional high-resolution time-course expression analysis of human embryonic stem cells undergoing cardiac induction , 2016, Genomics data.

[3]  Wyeth W. Wasserman,et al.  JASPAR: an open-access database for eukaryotic transcription factor binding profiles , 2004, Nucleic Acids Res..

[4]  Susumu Goto,et al.  KEGG: Kyoto Encyclopedia of Genes and Genomes , 2000, Nucleic Acids Res..

[5]  Igor Jurisica,et al.  Knowledge Discovery and interactive Data Mining in Bioinformatics - State-of-the-Art, future challenges and research directions , 2014, BMC Bioinformatics.

[6]  Arlindo L. Oliveira,et al.  Biclustering algorithms for biological data analysis: a survey , 2004, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[7]  Todd C. McDevitt,et al.  Gene Expression Signatures of Extracellular Matrix and Growth Factors during Embryonic Stem Cell Differentiation , 2012, PloS one.

[8]  Avi Ma'ayan,et al.  Enrichr: interactive and collaborative HTML5 gene list enrichment analysis tool , 2013, BMC Bioinformatics.

[9]  M. Braga,et al.  Exploratory Data Analysis , 2018, Encyclopedia of Social Network Analysis and Mining. 2nd Ed..

[10]  Harald Binder,et al.  Big data in medical science--a biostatistical view. , 2015, Deutsches Arzteblatt international.

[11]  Xin Chen,et al.  TRANSFAC: an integrated system for gene expression regulation , 2000, Nucleic Acids Res..

[12]  C. Murry,et al.  The advancement of human pluripotent stem cell-derived therapies into the clinic , 2015, Development.

[13]  K. Devriendt,et al.  MEIS2 involvement in cardiac development, cleft palate, and intellectual disability , 2015, American journal of medical genetics. Part A.

[14]  K. Laugwitz,et al.  Cardiovascular development: towards biomedical applicability , 2007, Cellular and Molecular Life Sciences.

[15]  J. A. Hartigan,et al.  A k-means clustering algorithm , 1979 .

[16]  Sean M. Wu,et al.  Early cardiac development: a view from stem cells to embryos. , 2012, Cardiovascular research.

[17]  M. V. D. van den Hoff,et al.  Wnt signaling in the heart fields: Variations on a common theme , 2016, Developmental dynamics : an official publication of the American Association of Anatomists.

[18]  S. Yartsev Potential benefits of large database analysis. , 2017, Annals of translational medicine.

[19]  N. Dubois,et al.  Probing early heart development to instruct stem cell differentiation strategies , 2016, Developmental dynamics : an official publication of the American Association of Anatomists.

[20]  Hiroyuki Ogata,et al.  KEGG: Kyoto Encyclopedia of Genes and Genomes , 1999, Nucleic Acids Res..

[21]  Gene Ontology Consortium The Gene Ontology (GO) database and informatics resource , 2003 .

[22]  Sebastian A. Leidel,et al.  Stepwise Clearance of Repressive Roadblocks Drives Cardiac Induction in Human ESCs. , 2016, Cell stem cell.

[23]  C. Murry,et al.  The advancement of human pluripotent stem cell-derived therapies into the clinic , 2015, Development.

[24]  M. Yen,et al.  Current Applications of Human Pluripotent Stem Cells: Possibilities and Challenges , 2012, Cell transplantation.

[25]  Matko Bosnjak,et al.  REVIGO Summarizes and Visualizes Long Lists of Gene Ontology Terms , 2011, PloS one.

[26]  Arjun Deb Cell-cell interaction in the heart via Wnt/β-catenin pathway after cardiac injury. , 2014, Cardiovascular research.

[27]  Michelle Dunn,et al.  The National Institutes of Health's Big Data to Knowledge (BD2K) initiative: capitalizing on biomedical big data , 2014, J. Am. Medical Informatics Assoc..

[28]  Catherine A. Risebro,et al.  Hand1 regulates cardiomyocyte proliferation versus differentiation in the developing heart , 2006, Development.

[29]  R. Passier,et al.  Transcriptome of human foetal heart compared with cardiomyocytes from pluripotent stem cells , 2015, Development.

[30]  Namshin Kim,et al.  Global Transcriptome Profiling of Genes that Are Differentially Regulated During Differentiation of Mouse Embryonic Neural Stem Cells into Astrocytes , 2014, Journal of Molecular Neuroscience.

[31]  Jaume Bacardit,et al.  Hard Data Analytics Problems Make for Better Data Analysis Algorithms: Bioinformatics as an Example , 2014, Big Data.

[32]  Lin Chen,et al.  Short-term BMP-4 treatment initiates mesoderm induction in human embryonic stem cells. , 2008, Blood.

[33]  Zhenming Hu,et al.  MicroRNA expression profiles in human adipose-derived stem cells during chondrogenic differentiation , 2014, International journal of molecular medicine.

[34]  Michael Kühl,et al.  The Multiple Phases and Faces of Wnt Signaling During Cardiac Differentiation and Development , 2010, Circulation research.

[35]  Weijun Luo,et al.  Pathview: an R/Bioconductor package for pathway-based data integration and visualization , 2013, Bioinform..

[36]  Zahir Tari,et al.  A Survey of Clustering Algorithms for Big Data: Taxonomy and Empirical Analysis , 2014, IEEE Transactions on Emerging Topics in Computing.

[37]  Burak Eksioglu,et al.  Clustering of high throughput gene expression data , 2012, Comput. Oper. Res..

[38]  K. Marra,et al.  Expression analysis of human adipose-derived stem cells during in vitro differentiation to an adipocyte lineage , 2015, BMC Medical Genomics.

[39]  P. Burridge,et al.  Improved Human Embryonic Stem Cell Embryoid Body Homogeneity and Cardiomyocyte Differentiation from a Novel V‐96 Plate Aggregation System Highlights Interline Variability , 2007, Stem cells.

[40]  Jian Li,et al.  Short-term BMP4 treatment initiates mesoderm induction in human embryonic stem cells , 2007 .

[41]  Hyung Joon Kim,et al.  Gene Expression Profiles of Human Adipose Tissue-Derived Mesenchymal Stem Cells Are Modified by Cell Culture Density , 2014, PloS one.

[42]  Ben Shneiderman,et al.  Tree visualization with tree-maps: 2-d space-filling approach , 1992, TOGS.

[43]  Zhi-hua Chen,et al.  Kyoto Encyclopedia of Genes and Genomes were used for functional enrichment analysis of differentially expressed genes (DEGs). A protein‐protein interaction network was constructed, and the hub genes were subjected to module analysis and identification using Search Tool for the Retrieval , 2019 .

[44]  Shaohua Xu,et al.  Differentiation of pluripotent stem cells for regenerative medicine. , 2016, Biochemical and biophysical research communications.

[45]  M. Mercola,et al.  Heart induction by Wnt antagonists depends on the homeodomain transcription factor Hex. , 2005, Genes & development.

[46]  Rui Xu,et al.  Clustering Algorithms in Biomedical Research: A Review , 2010, IEEE Reviews in Biomedical Engineering.

[47]  T. Zwaka,et al.  Wnt5a and Wnt11 inhibit the canonical Wnt pathway and promote cardiac progenitor development via the Caspase-dependent degradation of AKT. , 2015, Developmental biology.

[48]  Pooja Mittal,et al.  A novel signaling pathway impact analysis , 2009, Bioinform..

[49]  S. Ergün,et al.  Hox genes are involved in vascular wall-resident multipotent stem cell differentiation into smooth muscle cells , 2013, Scientific Reports.

[50]  Jing Zhou,et al.  Characterization of human bone morphogenetic protein gene variants for possible roles in congenital heart disease , 2016, Molecular medicine reports.

[51]  Hossein Baharvand,et al.  Comprehensive Gene Expression Analysis of Human Embryonic Stem Cells during Differentiation into Neural Cells , 2011, PloS one.

[52]  K. Docherty,et al.  Distinctive Roles of Canonical and Noncanonical Wnt Signaling in Human Embryonic Cardiomyocyte Development , 2016, Stem cell reports.

[53]  B Marshall,et al.  Gene Ontology Consortium: The Gene Ontology (GO) database and informatics resource , 2004, Nucleic Acids Res..