EBIC: an open source software for high-dimensional and big data analyses

Motivation In this paper we present an open source package with the latest release of EBIC, a next-generation biclustering algorithm for mining genetic data. The major contribution of this paper is adding full multi-GPU support, which makes it possible to run efficiently large genomic data mining analyses. Multiple enhancements to the first release of the algorithm include integration with R and Bioconductor, and an option to exclude missing values from analysis. Results EBIC was applied to datasets of different sizes, including a large DNA methylation dataset with 436,444 rows. For the largest dataset we observed over 6.6 fold speedup in computation time on a cluster of 8 GPUs compared to running the method on a single GPU. This proves high scalability of the method. Availability The latest version of EBIC could be downloaded from http://github.com/EpistasisLab/ebic. Installation and usage instructions are also available online. Supplementary information Supplementary informations are available online.

[1]  Krzysztof Boryczko,et al.  Propagation-Based Biclustering Algorithm for Extracting Inclusion-Maximal Motifs , 2016, Comput. Informatics.

[2]  Krzysztof Boryczko,et al.  Hybrid Biclustering Algorithms for Data Mining , 2016, EvoApplications.

[3]  Hsien-Da Huang,et al.  Biclustering of transcriptome sequencing data reveals human tissue-specific circular RNAs , 2018, BMC Genomics.

[4]  Jason H. Moore,et al.  runibic: a Bioconductor package for parallel row-based biclustering of gene expression data , 2017, bioRxiv.

[5]  Anindya Bhattacharya,et al.  A GPU-accelerated algorithm for biclustering analysis and detection of condition-dependent coexpression network modules , 2017, Scientific Reports.

[6]  G. Bhanot,et al.  TuBA: Tunable Biclustering Algorithm Reveals Clinically Relevant Tumor Transcriptional Profiles in Breast Cancer , 2018, bioRxiv.

[7]  Jacek M. Zurada,et al.  Artificial Intelligence and Soft Computing, 10th International Conference, ICAISC 2010, Zakopane, Poland, June 13-17, 2010, Part I , 2010, International Conference on Artificial Intelligence and Soft Computing.

[8]  Jason H. Moore,et al.  EBIC: an evolutionary‐based parallel biclustering algorithm for pattern discovery , 2018, Bioinform..

[9]  Ricardo J. G. B. Campello,et al.  A systematic comparative evaluation of biclustering techniques , 2017, BMC Bioinformatics.

[10]  Jorge González-Domínguez,et al.  ParBiBit: Parallel tool for binary biclustering on modern distributed-memory systems , 2018, PloS one.

[11]  Adetayo Kasim,et al.  Applied Biclustering Methods for Big and High-Dimensional Data Using R , 2016 .

[12]  Giovanni Squillero,et al.  Applications of Evolutionary Computation , 2016, Lecture Notes in Computer Science.