A Novel Algorithm for the Precise Calculation of the Maximal Information Coefficient

Measuring associations is an important scientific task. A novel measurement method maximal information coefficient (MIC) was proposed to identify a broad class of associations. As foreseen by its authors, MIC implementation algorithm ApproxMaxMI is not always convergent to real MIC values. An algorithm called SG (Simulated annealing and Genetic) was developed to facilitate the optimal calculation of MIC, and the convergence of SG was proved based on Markov theory. When run on fruit fly data set including 1,000,000 pairs of gene expression profiles, the mean squared difference between SG and the exhaustive algorithm is 0.00075499, compared with 0.1834 in the case of ApproxMaxMI. The software SGMIC and its manual are freely available at http://lxy.depart.hebust.edu.cn/SGMIC/SGMIC.htm.

[1]  Michael Ruogu Zhang,et al.  Comprehensive identification of cell cycle-regulated genes of the yeast Saccharomyces cerevisiae by microarray hybridization. , 1998, Molecular biology of the cell.

[2]  Michael Mitzenmacher,et al.  Detecting Novel Associations in Large Data Sets , 2011, Science.

[3]  Thomas J. Algeo,et al.  Mo–total organic carbon covariation in modern anoxic marine environments: Implications for analysis of paleoredox and paleohydrographic conditions , 2006 .

[4]  A. Wilkie,et al.  Faculty Opinions recommendation of Influence of life stress on depression: moderation by a polymorphism in the 5-HTT gene. , 2003 .

[5]  C. D. Gelatt,et al.  Optimization by Simulated Annealing , 1983, Science.

[6]  Le Kang,et al.  De Novo Analysis of Transcriptome Dynamics in the Migratory Locust during the Development of Phase Traits , 2010, PloS one.

[7]  A. Caspi,et al.  Influence of Life Stress on Depression: Moderation by a Polymorphism in the 5-HTT Gene , 2003, Science.

[8]  Michael Mitzenmacher,et al.  Theoretical Foundations of Equitability and the Maximal Information Coefficient , 2014, ArXiv.

[9]  Malka Gorfine,et al.  Comment on “ Detecting Novel Associations in Large Data Sets ” , 2012 .

[10]  R. Clayton,et al.  Oxygen isotope studies of achondrites , 1996 .

[11]  Moon,et al.  Estimation of mutual information using kernel density estimators. , 1995, Physical review. E, Statistical physics, plasmas, fluids, and related interdisciplinary topics.

[12]  J. Kinney,et al.  Equitability, mutual information, and the maximal information coefficient , 2013, Proceedings of the National Academy of Sciences.

[13]  R. Tibshirani,et al.  Comment on "Detecting Novel Associations In Large Data Sets" by Reshef Et Al, Science Dec 16, 2011 , 2014, 1401.7645.

[14]  Steven C. Lawlor,et al.  MAPPFinder: using Gene Ontology and GenMAPP to create a global gene-expression profile from microarray data , 2003, Genome Biology.

[15]  Duolao Wang,et al.  Estimating Optimal Transformations for Multiple Regression Using the ACE Algorithm , 2004, Journal of Data Science.

[16]  A. Kraskov,et al.  Estimating mutual information. , 2003, Physical review. E, Statistical, nonlinear, and soft matter physics.

[17]  R. Tibshirani,et al.  Gene expression patterns of breast carcinomas distinguish tumor subclasses with clinical implications , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[18]  Cesare Furlanello,et al.  minerva and minepy: a C engine for the MINE suite and its R, Python and MATLAB wrappers , 2012, Bioinform..

[19]  Maria L. Rizzo,et al.  Brownian distance covariance , 2009, 1010.0297.

[20]  Finding correlations in big data , 2012, Nature Biotechnology.

[21]  Gary D. Bader,et al.  DRYGIN: a database of quantitative genetic interaction networks in yeast , 2009, Nucleic Acids Res..

[22]  A. Barabasi,et al.  High-Quality Binary Protein Interaction Map of the Yeast Interactome Network , 2008, Science.

[23]  Ulrich W. Thonemann,et al.  Optimizing simulated annealing schedules with genetic programming , 1996 .

[24]  B. Graveley The developmental transcriptome of Drosophila melanogaster , 2010, Nature.

[25]  Jun Zhang,et al.  Clustering-Based Adaptive Crossover and Mutation Probabilities for Genetic Algorithms , 2007, IEEE Transactions on Evolutionary Computation.

[26]  Lalit M. Patnaik,et al.  Adaptive probabilities of crossover and mutation in genetic algorithms , 1994, IEEE Trans. Syst. Man Cybern..