BIDEAL: A Toolbox for Bicluster Analysis - Generation, Visualization and Validation

This paper introduces a novel toolbox named BIDEAL for the generation of biclusters, their analysis, visualization, and validation. The objective is to facilitate researchers to use forefront biclustering algorithms embedded on a single platform. A single toolbox comprising various biclustering algorithms play a vital role to extract meaningful patterns from the data for detecting diseases, biomarkers, gene-drug association, etc. BIDEAL consists of seventeen biclustering algorithms, three biclusters visualization techniques, and six validation indices. The toolbox can analyze several types of data, including biological data through a graphical user interface. It also facilitates data preprocessing techniques i.e., binarization, discretization, normalization, elimination of null and missing values. The effectiveness of the developed toolbox has been presented through testing and validations on Saccharomyces cerevisiae cell cycle, Leukemia cancer, Mammary tissue profile, and Ligand screen in B-cells datasets. The biclusters of these datasets have been generated using BIDEAL and evaluated in terms of coherency, differential co-expression ranking, and similarity measure. The visualization of generated biclusters has also been provided through a heat map and gene plot.

[1]  Abhishek Roy,et al.  Self-Optimal Clustering Technique Using Optimized Threshold Function , 2014, IEEE Systems Journal.

[2]  A. Geva,et al.  Forecasting generalized epileptic seizures from the EEG signal by wavelet analysis and dynamic unsupervised fuzzy clustering , 1998, IEEE Transactions on Biomedical Engineering.

[3]  R. Tempelman,et al.  Bovine mammary gene expression profiling using a cDNA microarray enhanced for mammary-specific transcripts. , 2003, Physiological genomics.

[4]  L. Lazzeroni Plaid models for gene expression data , 2000 .

[5]  Richard M. Karp,et al.  Discovering local structure in gene expression data: the order-preserving submatrix problem , 2002, RECOMB '02.

[6]  Muhammad Abdul Qadir,et al.  BiSim: A Simple and Efficient Biclustering Algorithm , 2009, 2009 International Conference of Soft Computing and Pattern Recognition.

[7]  Ulrich Bodenhofer,et al.  FABIA: factor analysis for bicluster acquisition , 2010, Bioinform..

[8]  Karuturi R. Krishna Murthy,et al.  Differential co-expression framework to quantify goodness of biclusters and compare biclustering algorithms , 2010, Algorithms for Molecular Biology.

[9]  Ron Shamir,et al.  EXPANDER – an integrative program suite for microarray data analysis , 2005, BMC Bioinformatics.

[10]  Kemal Eren,et al.  Application of biclustering algorithms to biological data , 2012 .

[11]  Duo Wang,et al.  MSVD-MOEB algorithm applied to cancer gene expression data , 2015, 2015 IEEE 7th International Conference on Awareness Science and Technology (iCAST).

[12]  Arlindo L. Oliveira,et al.  Biclustering algorithms for biological data analysis: a survey , 2004, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[13]  Roded Sharan,et al.  Discovering statistically significant biclusters in gene expression data , 2002, ISMB.

[14]  Eckart Zitzler,et al.  BicAT: a biclustering analysis toolbox , 2006, Bioinform..

[15]  Katharina J. Hoff,et al.  Orphelia: predicting genes in metagenomic sequencing reads , 2009, Nucleic Acids Res..

[16]  Jessica Andrea Carballido,et al.  BAT: A New Biclustering Analysis Toolbox , 2010, BSB.

[17]  Roberto Therón,et al.  BicOverlapper 2.0: visual analysis for gene expression , 2014, Bioinform..

[18]  Inderjit S. Dhillon,et al.  Co-clustering documents and words using bipartite spectral graph partitioning , 2001, KDD '01.

[19]  J. Bezdek,et al.  FCM: The fuzzy c-means clustering algorithm , 1984 .

[20]  Philip S. Yu,et al.  An Improved Biclustering Method for Analyzing Gene Expression Profiles , 2005, Int. J. Artif. Intell. Tools.

[21]  Ying Xu,et al.  QUBIC: a qualitative biclustering algorithm for analyses of gene expression data , 2009, Nucleic acids research.

[22]  Jessica Andrea Carballido,et al.  BiHEA: A Hybrid Evolutionary Approach for Microarray Biclustering , 2009, BSB.

[23]  Yan Cui,et al.  Hausdorff distance and global silhouette index as novel measures for estimating quality of biclusters , 2015, 2015 IEEE International Conference on Bioinformatics and Biomedicine (BIBM).

[24]  Sven Bergmann,et al.  Iterative signature algorithm for the analysis of large-scale gene expression data. , 2002, Physical review. E, Statistical, nonlinear, and soft matter physics.

[25]  Jesús S. Aguilar-Ruiz,et al.  Biclustering on expression data: A review , 2015, J. Biomed. Informatics.

[26]  Inderjit S. Dhillon,et al.  Information-theoretic co-clustering , 2003, KDD '03.

[27]  Jesús S. Aguilar-Ruiz,et al.  A biclustering algorithm for extracting bit-patterns from binary datasets , 2011, Bioinform..

[28]  Nishchal K. Verma,et al.  A comparison of biclustering algorithms , 2010, 2010 International Conference on Systems in Medicine and Biology.

[29]  George M. Church,et al.  Biclustering of Expression Data , 2000, ISMB.

[30]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[31]  Wan-Chi Siu,et al.  BiVisu: software tool for bicluster detection and visualization , 2007, Bioinform..

[32]  Lothar Thiele,et al.  A systematic comparison and evaluation of biclustering methods for gene expression data , 2006, Bioinform..

[33]  Rui Henriques,et al.  BicPAMS: software for biological data analysis with pattern-based biclustering , 2017, BMC Bioinformatics.

[34]  Rui Henriques,et al.  BicNET: Flexible module discovery in large-scale biological networks using biclustering , 2016, Algorithms for Molecular Biology.

[35]  Friedrich Leisch,et al.  A toolbox for bicluster analysis in R , 2008 .

[36]  Francesco Masulli,et al.  Stability and Performances in Biclustering Algorithms , 2009, CIBB.

[37]  A. Nobel,et al.  Finding large average submatrices in high dimensional data , 2009, 0905.1682.

[38]  Leland Wilkinson,et al.  The History of the Cluster Heat Map , 2009 .

[39]  J. Mesirov,et al.  Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. , 1999, Science.

[40]  T. M. Murali,et al.  Extracting Conserved Gene Expression Motifs from Gene Expression Data , 2002, Pacific Symposium on Biocomputing.

[41]  Joseph T. Chang,et al.  Spectral biclustering of microarray data: coclustering genes and conditions. , 2003, Genome research.

[42]  Mohamed Nadif,et al.  CoClust: A Python Package for Co-Clustering , 2019, Journal of Statistical Software.

[43]  S. C. Johnson Hierarchical clustering schemes , 1967, Psychometrika.

[44]  Jonathon Love,et al.  JASP: Graphical Statistical Software for Common Statistical Designs , 2019, Journal of Statistical Software.

[45]  Yan Cui,et al.  Type-2 Fuzzy PCA Approach in Extracting Salient Features for Molecular Cancer Diagnostics and Prognostics , 2019, IEEE Transactions on NanoBioscience.

[46]  Yan Cui,et al.  Transfer Learning for Molecular Cancer Classification Using Deep Neural Networks , 2019, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[47]  Ahmed H. Tewfik,et al.  Robust biclustering algorithm (ROBA) for DNA microarray data analysis , 2005, IEEE/SP 13th Workshop on Statistical Signal Processing, 2005.