Discovering network relations in big time series with application to bioinformatics

Big Data concerns large­volume, complex and growing data sets, with multiple and autonomous sources. It is now rapidly expanding in all science and engineering domains [1]. Time series represent an important class of big data that can be obtained from several applications, such as medicine (electrocardiogram), environmental (daily temperature), financial (weekly sales totals, and prices of mutual funds and stocks) [2], as well as from many areas, such as social­networks and biology. Bioinformatics seeks to provide tools and analyses that facilitate understanding of living systems, by analyzing and correlating biological information. In particular, as increasingly large amounts of genes information have become available in the last years, more efficient algorithms for dealing with such big data in genomics are required [3]. There is an increasing interest in this field for the discovery of the network of regulations among a group of genes, named Gene Regulation Networks (GRN) [4], by analyzing the genes expression profiles represented as time­series. In [5] it has been proposed the GRNNminer method, which allows discovering the subyacent GRN among a group of genes, through the proper modeling of the temporal dynamics of the gene expression profiles with artificial neural networks. However, it implies building and training a pool of neural models for each possible gen­to­gen relationship, which derives in executing a very large set of experiments with O(​n​2​) order, where ​n is the total of involved genes. This work presents a proposal for dramatically reducing such experiments number to O( ) when big time­series is n/k) ( 2 involved for reconstructing a GRN from such data, by previously clustering genes profiles in ​k groups using self­organizing maps (SOM) [6]. This way, the GRNNminer can be applied over smaller sets of time­series, only those appearing in the same cluster.

[1]  Xindong Wu,et al.  Data mining with big data , 2014, IEEE Transactions on Knowledge and Data Engineering.

[2]  Claire J. Tomlin,et al.  Exact Reconstruction of Gene Regulatory Networks using Compressive Sensing , 2014 .

[3]  Georgina Stegmayer,et al.  Mining Gene Regulatory Networks by Neural Modeling of Expression Time-Series , 2015, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[4]  C. Titus Brown,et al.  khmer: Working with Big Data in Bioinformatics , 2013, ArXiv.