Biclustering Impact in Biomedical Sciences via Literature Mining

Biclustering algorithms have matured from their initial applications in bioinformatics, evolving towards different approaches and bicluster definitions, which makes sometimes hard for the analyst to determine which one of the available algorithms best fits her problem. As a way of benchmarking these algorithms, several quality measures have been proposed in literature. Such measures cover numerical aspects related to the accuracy, the recovery power or the capability of retrieving previous biomedical knowledge. However, biclustering apparently remains as an uncommon option for biomedicine analysis. Here we review the impact of biclustering algorithms in biomedicine and bioinformatics with the object of measuring and understanding non-numerical aspects of biclustering algorithms focusing on citation-based statistics that can be relevant for their application on the domain. In order to achieve this, we performed analyses of the citations impact of several clustering and biclustering algorithms, and propose a methodology that can cover this aspect of biclustering usage.

[1]  Patricia A. H. Williams,et al.  Big data in healthcare: What is it used for? , 2014 .

[2]  S Miyano,et al.  Open source clustering software. , 2004, Bioinformatics.

[3]  Eckart Zitzler,et al.  BicAT: a biclustering analysis toolbox , 2006, Bioinform..

[4]  Joshua M. Stuart,et al.  MICROARRAY EXPERIMENTS : APPLICATION TO SPORULATION TIME SERIES , 1999 .

[5]  J. Hartigan Direct Clustering of a Data Matrix , 1972 .

[6]  G Hripcsak,et al.  Biclustering of Adverse Drug Events in the FDA's Spontaneous Reporting System , 2011, Clinical pharmacology and therapeutics.

[7]  John Quackenbush,et al.  Open source software for the analysis of microarray data. , 2003, BioTechniques.

[8]  N. Sampas,et al.  Molecular classification of cutaneous malignant melanoma by gene expression profiling , 2000, Nature.

[9]  Roded Sharan,et al.  Discovering statistically significant biclusters in gene expression data , 2002, ISMB.

[10]  Olga G. Troyanskaya,et al.  The Sleipnir library for computational functional genomics , 2008, Bioinform..

[11]  Ying Xu,et al.  QUBIC: a qualitative biclustering algorithm for analyses of gene expression data , 2009, Nucleic acids research.

[12]  Robert Stevens,et al.  bioNerDS: exploring bioinformatics’ database and software use through literature mining , 2013, BMC Bioinformatics.

[13]  Jesús S. Aguilar-Ruiz,et al.  A biclustering algorithm for extracting bit-patterns from binary datasets , 2011, Bioinform..

[14]  Jean YH Yang,et al.  Bioconductor: open software development for computational biology and bioinformatics , 2004, Genome Biology.

[15]  Sven Bergmann,et al.  Defining transcription modules using large-scale gene expression data , 2004, Bioinform..

[16]  Ulrich Bodenhofer,et al.  FABIA: factor analysis for bicluster acquisition , 2010, Bioinform..

[17]  John Quackenbush,et al.  Genesis: cluster analysis of microarray data , 2002, Bioinform..

[18]  Ron Shamir,et al.  A hierarchical Bayesian model for flexible module discovery in three-way time-series data , 2015, Bioinform..

[19]  Matthias Egger,et al.  Sexual transmission of HIV according to viral load and antiretroviral therapy: systematic review and meta-analysis , 2009, AIDS.

[20]  A. Nobel,et al.  Finding large average submatrices in high dimensional data , 2009, 0905.1682.

[21]  C. Gatz,et al.  The Arabidopsis GRAS Protein SCL14 Interacts with Class II TGA Transcription Factors and Is Essential for the Activation of Stress-Inducible Promoters[C][W] , 2008, The Plant Cell Online.

[22]  Ron Shamir,et al.  Clustering Gene Expression Patterns , 1999, J. Comput. Biol..

[23]  Roberto Therón,et al.  BicOverlapper 2.0: visual analysis for gene expression , 2014, Bioinform..

[24]  L. Lazzeroni Plaid models for gene expression data , 2000 .

[25]  Richard M. Karp,et al.  Discovering local structure in gene expression data: the order-preserving submatrix problem , 2002, RECOMB '02.

[26]  Anil K. Jain,et al.  Data clustering: a review , 1999, CSUR.

[27]  Jun S Liu,et al.  Bayesian biclustering of gene expression data , 2008, BMC Genomics.

[28]  Michael F. Lin,et al.  Chromatin signature reveals over a thousand highly conserved large non-coding RNAs in mammals , 2009, Nature.

[29]  Sven Bergmann,et al.  Iterative signature algorithm for the analysis of large-scale gene expression data. , 2002, Physical review. E, Statistical, nonlinear, and soft matter physics.

[30]  Adrian E. Raftery,et al.  MCLUST: Software for Model-Based Clustering, Density Estimation and Discriminant Analysis , 2002 .

[31]  D. Botstein,et al.  Cluster analysis and display of genome-wide expression patterns. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[32]  Wojtek J. Krzanowski,et al.  Improved biclustering of microarray data demonstrated through systematic performance tests , 2005, Comput. Stat. Data Anal..

[33]  Ben Shneiderman,et al.  Interactively Exploring Hierarchical Clustering Results , 2002, Computer.

[34]  T. M. Murali,et al.  Extracting Conserved Gene Expression Motifs from Gene Expression Data , 2002, Pacific Symposium on Biocomputing.

[35]  Florentino Fernández Riverola,et al.  Mass-Up: an all-in-one open software application for MALDI-TOF mass spectrometry knowledge discovery , 2015, BMC Bioinformatics.

[36]  Kari Clase,et al.  A Survey of Scholarly Literature Describing the Field of Bioinformatics Education and Bioinformatics Educational Research , 2014, CBE life sciences education.

[37]  J. Troge,et al.  Tumour evolution inferred by single-cell sequencing , 2011, Nature.

[38]  Mayetri Gupta,et al.  Identification of homogeneous genetic architecture of multiple genetically correlated traits by block clustering of genome‐wide associations , 2011, Journal of bone and mineral research : the official journal of the American Society for Bone and Mineral Research.

[39]  George M. Church,et al.  Biclustering of Expression Data , 2000, ISMB.

[40]  Ricardo J. G. B. Campello,et al.  A systematic comparative evaluation of biclustering techniques , 2017, BMC Bioinformatics.

[41]  J. Raes,et al.  Population-level analysis of gut microbiome variation , 2016, Science.

[42]  H. Bussemaker,et al.  Regulatory element detection using correlation with expression , 2001, Nature Genetics.

[43]  Bie M. P. Verbist,et al.  Using transcriptomics to guide lead optimization in drug discovery projects: Lessons learned from the QSTAR project. , 2015, Drug discovery today.

[44]  Ümit V. Çatalyürek,et al.  A Biclustering Method to Discover Co-regulated Genes Using Diverse Gene Expression Datasets , 2009, BICoB.

[45]  G. Church,et al.  Systematic determination of genetic network architecture , 1999, Nature Genetics.

[46]  Ron Shamir,et al.  EXPANDER – an integrative program suite for microarray data analysis , 2005, BMC Bioinformatics.

[47]  Ludo Waltman,et al.  A review of the literature on citation impact indicators , 2015, J. Informetrics.

[48]  G. Church,et al.  A global view of pleiotropy and phenotypically derived gene function in yeast , 2005, Molecular systems biology.

[49]  Mehmet Deveci,et al.  A comparative analysis of biclustering algorithms for gene expression data , 2013, Briefings Bioinform..

[50]  Joseph T. Chang,et al.  Spectral biclustering of microarray data: coclustering genes and conditions. , 2003, Genome research.

[51]  J. Mesirov,et al.  Interpreting patterns of gene expression with self-organizing maps: methods and application to hematopoietic differentiation. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[52]  Cesim Erten,et al.  Improving performances of suboptimal greedy iterative biclustering heuristics via localization , 2010, Bioinform..

[53]  Inderjit S. Dhillon,et al.  Minimum Sum-Squared Residue Co-Clustering of Gene Expression Data , 2004, SDM.

[54]  Martin Vingron,et al.  DeBi: Discovering Differentially Expressed Biclusters using a Frequent Itemset Approach , 2011, Algorithms for Molecular Biology.

[55]  Adrian E. Raftery,et al.  Model-based clustering and data transformations for gene expression data , 2001, Bioinform..

[56]  Hyunsoo Kim,et al.  Unraveling condition specific gene transcriptional regulatory networks in Saccharomyces cerevisiae , 2006, BMC Bioinformatics.

[57]  Arlindo L. Oliveira,et al.  Biclustering algorithms for biological data analysis: a survey , 2004, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[58]  G. Tseng,et al.  Comprehensive literature review and statistical considerations for microarray meta-analysis , 2012, Nucleic acids research.

[59]  Ricardo J. G. B. Campello,et al.  Similarity Measures for Comparing Biclusterings , 2014, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[60]  Olga G. Troyanskaya,et al.  Detailing regulatory networks through large scale data integration , 2009, Bioinform..

[61]  Nitin S. Baliga,et al.  cMonkey2: Automated, systematic, integrated detection of co-regulated gene modules for any organism , 2015, Nucleic acids research.

[62]  MJagadesh Kumar,et al.  Evaluating Scientists: Citations, Impact Factor, h-Index, Online Page Hits and What Else? , 2009 .

[63]  S. Hochreiter HapFABIA: Identification of very short segments of identity by descent characterized by rare variants in large sequencing data , 2013, Nucleic acids research.

[64]  Robert Stevens,et al.  A Survey of Bioinformatics Database and Software Usage through Mining the Literature , 2016, PloS one.

[65]  R. Sharan,et al.  Expander: from expression microarrays to networks and functions , 2010, Nature Protocols.

[66]  Ümit V. Çatalyürek,et al.  Comparative analysis of biclustering algorithms , 2010, BCB '10.

[67]  Lothar Thiele,et al.  A systematic comparison and evaluation of biclustering methods for gene expression data , 2006, Bioinform..

[68]  Katarzyna H. Kaminska,et al.  Characterization of drug-induced transcriptional modules: towards drug repositioning and functional understanding , 2013, Molecular systems biology.

[69]  Andreas Zell,et al.  EDISA: extracting biclusters from multiple time-series of gene expression profiles , 2007, BMC Bioinformatics.

[70]  Ned S. Wingreen,et al.  Finding regulatory modules through large-scale gene-expression data analysis , 2003, Bioinform..

[71]  Régis Beuscart,et al.  Toward a Literature-Driven Definition of Big Data in Healthcare , 2015, BioMed research international.

[72]  Alok J. Saldanha,et al.  Java Treeview - extensible visualization of microarray data , 2004, Bioinform..

[73]  Y. Quan,et al.  Elucidating Pharmacological Mechanisms of Natural Medicines by Biclustering Analysis of the Gene Expression Profile: A Case Study on Curcumin and Si-Wu-Tang , 2014, International journal of molecular sciences.

[74]  D Haussler,et al.  Knowledge-based analysis of microarray gene expression data by using support vector machines. , 2000, Proceedings of the National Academy of Sciences of the United States of America.