Discretization of gene expression data revised

Gene expression measurements represent the most important source of biological data used to unveil the interaction and functionality of genes. In this regard, several data mining and machine learning algorithms have been proposed that require, in a number of cases, some kind of data discretization to perform the inference. Selection of an appropriate discretization process has a major impact on the design and outcome of the inference algorithms, as there are a number of relevant issues that need to be considered. This study presents a revision of the current state-of-the-art discretization techniques, together with the key subjects that need to be considered when designing or selecting a discretization approach for gene expression data.

[1]  A. Brazma,et al.  Towards reconstruction of gene networks from expression data by supervised learning , 2003, Genome Biology.

[2]  Sinh Hoa Nguyen,et al.  On Finding Optimal Discretizations for Two Attributes , 1998, Rough Sets and Current Trends in Computing.

[3]  M. Stephens,et al.  RNA-seq: an assessment of technical reproducibility and comparison with gene expression arrays. , 2008, Genome research.

[4]  Ozgur Ozturk,et al.  A time series analysis of microarray data , 2004, Proceedings. Fourth IEEE Symposium on Bioinformatics and Bioengineering.

[5]  Hong-Qiang Wang,et al.  Biology-constrained gene expression discretization for cancer classification , 2014, Neurocomputing.

[6]  Kwang-Hyun Cho,et al.  Microarray data clustering based on temporal variation: FCV with TSD preclustering. , 2003, Applied bioinformatics.

[7]  C. Becquet,et al.  Strong-association-rule mining for large-scale gene-expression data analysis: a case study on human SAGE data , 2002, Genome Biology.

[8]  Jesús S. Aguilar-Ruiz,et al.  Gene association analysis: a survey of frequent pattern mining from gene expression data , 2010, Briefings Bioinform..

[9]  Usama M. Fayyad,et al.  Multi-Interval Discretization of Continuous-Valued Attributes for Classification Learning , 1993, IJCAI.

[10]  Jessica Andrea Carballido,et al.  Discovering time-lagged rules from microarray data using gene profile classifiers , 2011, BMC Bioinformatics.

[11]  Wojciech Szpankowski,et al.  Finding Biclusters by Random Projections , 2004, CPM.

[12]  Hugues Bersini,et al.  Batch effect removal methods for microarray gene expression data integration: a survey , 2013, Briefings Bioinform..

[13]  Shyam Visweswaran,et al.  Application of an efficient Bayesian discretization method to biomedical data , 2011, BMC Bioinformatics.

[14]  Arlindo L. Oliveira,et al.  Identification of Regulatory Modules in Time Series Gene Expression Data Using a Linear Time Biclustering Algorithm , 2010, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[15]  Nir Friedman,et al.  Discretizing Continuous Attributes While Learning Bayesian Networks , 1996, ICML.

[16]  Jugal K. Kalita,et al.  Discretization in gene expression data analysis: a selected survey , 2012, CCSEIT '12.

[17]  Marc Boullé,et al.  MODL: A Bayes optimal discretization method for continuous attributes , 2006, Machine Learning.

[18]  J. Mesirov,et al.  Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. , 1999, Science.

[19]  John McGee,et al.  Discretization of Time Series Data , 2005, J. Comput. Biol..

[20]  Kian-Lee Tan,et al.  Mining gene expression data for positive and negative co-regulated gene clusters , 2004, Bioinform..

[21]  Lili Liu,et al.  Comparative study of discretization methods of microarray data for inferring transcriptional regulatory networks , 2010, BMC Bioinformatics.

[22]  M. Gerstein,et al.  RNA-Seq: a revolutionary tool for transcriptomics , 2009, Nature Reviews Genetics.

[23]  Ruggero G. Pensa,et al.  Assessment of discretization techniques for relevant pattern discovery from gene expression data , 2004, BIOKDD.

[24]  Pierre Hansen,et al.  NP-hardness of Euclidean sum-of-squares clustering , 2008, Machine Learning.

[25]  Chris H. Q. Ding,et al.  Minimum Redundancy Feature Selection from Microarray Gene Expression Data , 2005, J. Bioinform. Comput. Biol..

[26]  Witold Pedrycz,et al.  Data Mining: A Knowledge Discovery Approach , 2007 .

[27]  Arlindo L. Oliveira,et al.  An Evaluation of Discretization Methods for Non-Supervised Analysis of Time-Series Gene Expression Data , 2005 .

[28]  U. Alon,et al.  Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[29]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[30]  Li Li,et al.  Discovery of time-delayed gene regulatory networks based on temporal gene expression profiling , 2006, BMC Bioinformatics.

[31]  Thomas Schiex,et al.  Gene Regulatory Network Reconstruction Using Bayesian Networks, the Dantzig Selector, the Lasso and Their Meta-Analysis , 2011, PloS one.

[32]  Albert Y. Zomaya,et al.  Biological Knowledge Discovery Handbook: Preprocessing, Mining and Postprocessing of Biological Data , 2013 .

[33]  Michael Ruogu Zhang,et al.  Comprehensive identification of cell cycle-regulated genes of the yeast Saccharomyces cerevisiae by microarray hybridization. , 1998, Molecular biology of the cell.

[34]  Yong Wang,et al.  A novel discretization method for processing digital gene expression profiles , 2013, 2013 7th International Conference on Systems Biology (ISB).

[35]  Francisco Herrera,et al.  A Survey of Discretization Techniques: Taxonomy and Empirical Analysis in Supervised Learning , 2013, IEEE Transactions on Knowledge and Data Engineering.

[36]  Ron Kohavi,et al.  Supervised and Unsupervised Discretization of Continuous Features , 1995, ICML.

[37]  Guy Karlebach,et al.  Modelling and analysis of gene regulatory networks , 2008, Nature Reviews Molecular Cell Biology.

[38]  D.H. Glass,et al.  Inferring Adaptive Regulation Thresholds and Association Rules from Gene Expression Data through Combinatorial Optimization Learning , 2007, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[39]  Nedumparambathmarath Vijesh,et al.  Modeling of gene regulatory networks: A review , 2013 .