Data-Driven Analysis of Collections of Big Datasets by the Bi-CoPaM Method Yields Field-Specific Novel Insights

Massive amounts of data have recently been, and are increasingly being, generated from various fields, such as bioinformatics, neuroscience and social networks. Many of these big datasets were generated to answer specific research questions, and were analysed accordingly. However, the scope of information contained in these datasets can usually answer much broader questions than what was originally intended. Moreover, many existing big datasets are related to each other but have different detailed specifications, and the mutual information that can be extracted from them collectively has been not commonly considered. To bridge this gap between the fast pace of data generation and the slower pace of data analysis, and to exploit the massive amounts of existing data, we suggest employing data-driven explorations to analyse collections of related big datasets. This approach aims at extracting field-specific novel findings which can be revealed from the data without being driven by specific questions or hypotheses. To realise this paradigm, we introduced the binarisation of consensus partition matrices (Bi-CoPaM) method, with the ability of analysing collections of heterogeneous big datasets to identify clusters of consistently correlated objects. We demonstrate the power of data-driven explorations by applying the Bi-CoPaM to two collections of big datasets from two distinct fields, namely bioinformatics and neuroscience. In the first application, the collective analysis of forty yeast gene expression datasets identified a novel cluster of genes and some new biological hypotheses regarding their function and regulation. In the other application, the analysis of 1,856 big fMRI datasets identified three functionally connected neural networks related to visual, reward and auditory systems during affective processing. These experiments reveal the broad applicability of this paradigm to various fields, and thus encourage exploring the large amounts of partially exploited existing datasets, preferably as collections of related datasets, with a similar approach.

[1]  James B. Anderson,et al.  Cellular Effects and Epistasis among Three Determinants of Adaptation in Experimental Populations of Saccharomyces cerevisiae , 2011, Eukaryotic Cell.

[2]  Karl J. Friston,et al.  Statistical parametric maps in functional imaging: A general linear approach , 1994 .

[3]  Distinct roles of the Gcn5 histone acetyltransferase revealed during transient stress-induced reprogramming of the genome , 2013, BMC Genomics.

[4]  Chao Cheng,et al.  Comparative analyses of time-course gene expression profiles of the long-lived sch9Δ mutant , 2009, Nucleic acids research.

[5]  Tadahiro Suzuki,et al.  Gene expression profiles of yeast Saccharomyces cerevisiae sod1 caused by patulin toxicity and evaluation of recovery potential of ascorbic acid. , 2011, Journal of agricultural and food chemistry.

[6]  Tadahiro Suzuki,et al.  Comprehensive gene expression analysis of type B trichothecenes. , 2012, Journal of agricultural and food chemistry.

[7]  Michael B. Mayhew,et al.  Cyclin-dependent kinases are regulators and effectors of oscillations driven by a transcription factor network. , 2012, Molecular cell.

[8]  Rebekah Cook,et al.  The Saccharomyces cerevisiae transcriptome as a mirror of phytochemical variation in complex extracts of Equisetum arvense from America, China, Europe and India , 2013, BMC Genomics.

[9]  Javier Arroyo,et al.  Chromatin remodeling by the SWI/SNF complex is essential for transcription mediated by the yeast cell wall integrity MAPK pathway , 2012, Molecular biology of the cell.

[10]  Hans Knutsson,et al.  Cluster failure: Why fMRI inferences for spatial extent have inflated false-positive rates , 2016, Proceedings of the National Academy of Sciences.

[11]  Robert J. Zatorre,et al.  Neural Interactions That Give Rise to Musical Pleasure , 2013 .

[12]  Rosa Luna,et al.  Nab2 functions in the metabolism of RNA driven by polymerases II and III , 2011, Molecular biology of the cell.

[13]  L. Selth,et al.  Functional Studies of the Yeast Med5, Med15 and Med16 Mediator Tail Subunits , 2013, PloS one.

[14]  A. Friederici,et al.  Investigating emotion with music: An fMRI study , 2006, Human brain mapping.

[15]  J. Shimony,et al.  Resting-State fMRI: A Review of Methods and Clinical Applications , 2013, American Journal of Neuroradiology.

[16]  Charles Elkan,et al.  Fitting a Mixture Model By Expectation Maximization To Discover Motifs In Biopolymer , 1994, ISMB.

[17]  Asoke K. Nandi,et al.  Comprehensive analysis of forty yeast microarray datasets reveals a novel subset of genes (APha-RiB) consistently negatively associated with ribosome biogenesis , 2014, BMC Bioinformatics.

[18]  William Stafford Noble,et al.  Quantifying similarity between motifs , 2007, Genome Biology.

[19]  L. D. Dhinesh Babu,et al.  An enhanced trust prediction strategy for online social networks using probabilistic reputation features , 2017, Neurocomputing.

[20]  Mikko Sams,et al.  Large-scale brain networks emerge from dynamic processing of musical timbre, key and rhythm , 2012, NeuroImage.

[21]  Joshua E. S. Socolar,et al.  Global control of cell-cycle transcription by coupled CDK and network oscillators , 2008, Nature.

[22]  Asoke K. Nandi,et al.  UNCLES: method for the identification of genes differentially consistently co-expressed in a specific subset of datasets , 2015, BMC Bioinformatics.

[23]  Asoke K. Nandi,et al.  Towards Tunable Consensus Clustering for Studying Functional Brain Connectivity During Affective Processing , 2017, Int. J. Neural Syst..

[24]  N. Tzourio-Mazoyer,et al.  Automated Anatomical Labeling of Activations in SPM Using a Macroscopic Anatomical Parcellation of the MNI MRI Single-Subject Brain , 2002, NeuroImage.

[25]  Joydeep Ghosh,et al.  Cluster Ensembles --- A Knowledge Reuse Framework for Combining Multiple Partitions , 2002, J. Mach. Learn. Res..

[26]  Duygu Dikicioglu,et al.  How yeast re-programmes its transcriptional profile in response to different nutrient impulses , 2011, BMC Systems Biology.

[27]  Daniel S. Margulies,et al.  Prioritizing spatial accuracy in high-resolution fMRI data using multivariate feature weight mapping , 2014, Front. Neurosci..

[28]  M. Tervaniemi,et al.  A Functional MRI Study of Happy and Sad Emotions in Music with and without Lyrics , 2011, Front. Psychology.

[29]  S. Koelsch Brain correlates of music-evoked emotions , 2014, Nature Reviews Neuroscience.

[30]  Birgitta Burger,et al.  Dance moves reflect current affective state illustrative of approach–avoidance motivation. , 2013 .

[31]  Asoke K. Nandi,et al.  Yeast gene CMR1/YDL156W is consistently co-expressed with genes participating in DNA-metabolic processes in a variety of stringent clustering experiments , 2013, Journal of The Royal Society Interface.

[32]  A. Nandi,et al.  Paradigm of Tunable Clustering Using Binarization of Consensus Partition Matrices (Bi-CoPaM) for Gene Discovery , 2013, PloS one.

[33]  Jon M. Kleinberg,et al.  An Impossibility Theorem for Clustering , 2002, NIPS.

[34]  S. Koelsch Towards a neural basis of music-evoked emotions , 2010, Trends in Cognitive Sciences.

[35]  Lisa Feldman Barrett,et al.  The Structure of Emotion , 2006 .

[36]  Asoke K. Nandi,et al.  Application of the Bi-CoPaM Method to Five Escherichia Coli Datasets Generated under Various Biological Conditions , 2015, J. Signal Process. Syst..

[37]  Stefan Bekiranov,et al.  The Snf1 kinase and proteasome‐associated Rad23 regulate UV‐responsive gene expression , 2009, The EMBO journal.

[38]  J. Shima,et al.  Identification of a gene, FMP21, whose expression levels are involved in thermotolerance in Saccharomyces cerevisiae , 2014, AMB Express.

[39]  Asoke K. Nandi,et al.  Integrative Cluster Analysis in Bioinformatics , 2015 .

[40]  Intawat Nookaew,et al.  Integrated analysis, transcriptome-lipidome, reveals the effects of INO-level (INO2 and INO4) on lipid metabolism in yeast , 2013, BMC Systems Biology.

[41]  M. Steen,et al.  ERRATUM: Network Science and the Effects of Music Preference on Functional Brain Connectivity: From Beethoven to Eminem , 2014, Scientific Reports.

[42]  Y. Hannun,et al.  Distinct Signaling Roles of Ceramide Species in Yeast Revealed Through Systematic Perturbation and Systems Biology Analyses , 2013, Science Signaling.

[43]  Jean-Baptiste Poline,et al.  Which fMRI clustering gives good brain parcellations? , 2014, Front. Neurosci..

[44]  T. Jacobsen,et al.  Toward a Neural Chronometry for the Aesthetic Experience of Music , 2013, Front. Psychol..

[45]  Allan Timmermann,et al.  Forecasting in Economics and Finance , 2016 .

[46]  Ana M. Matia-González,et al.  Slt2 MAPK pathway is essential for cell integrity in the presence of arsenate , 2011, Yeast.

[47]  C. Rodrigues-Pousada,et al.  Arsenic stress elicits cytosolic Ca(2+) bursts and Crz1 activation in Saccharomyces cerevisiae. , 2012, Microbiology.

[48]  Rainer Goebel,et al.  Combining multivariate voxel selection and support vector machines for mapping and classification of fMRI spatial patterns , 2008, NeuroImage.

[49]  Macarena Morillo-Huesca,et al.  The SWR1 Histone Replacement Complex Causes Genetic Instability and Genome-Wide Transcription Misregulation in the Absence of H2A.Z , 2010, PloS one.

[50]  Nathan Crook,et al.  Linking Yeast Gcn5p Catalytic Function and Gene Regulation Using a Quantitative, Graded Dominant Mutant Approach , 2012, PloS one.

[51]  D. Petranovic,et al.  Anaerobic α-Amylase Production and Secretion with Fumarate as the Final Electron Acceptor in Saccharomyces cerevisiae , 2013, Applied and Environmental Microbiology.

[52]  Ian M. Marcus,et al.  Dynamics of oscillatory phenotypes in Saccharomyces cerevisiae reveal a network of genome‐wide transcriptional oscillators , 2012, The FEBS journal.

[53]  Martin M. Monti,et al.  Human Neuroscience , 2022 .

[54]  Dan Jacobson,et al.  Many Saccharomyces cerevisiae Cell Wall Protein Encoding Genes Are Coregulated by Mss11, but Cellular Adhesion Phenotypes Appear Only Flo Protein Dependent , 2012, G3: Genes | Genomes | Genetics.

[55]  Brian A. Nosek,et al.  Power failure: why small sample size undermines the reliability of neuroscience , 2013, Nature Reviews Neuroscience.

[56]  N. Volkow,et al.  Abnormal Functional Connectivity in Children with Attention-Deficit/Hyperactivity Disorder , 2012, Biological Psychiatry.

[57]  Dirk Walther,et al.  Dynamic transcriptional and metabolic responses in yeast adapting to temperature stress. , 2010, Omics : a journal of integrative biology.

[58]  M. V. D. Heuvel,et al.  Exploring the brain network: A review on resting-state fMRI functional connectivity , 2010, European Neuropsychopharmacology.