论文信息 - Inverse finite-size scaling for high-dimensional significance analysis.

Inverse finite-size scaling for high-dimensional significance analysis.

We propose an efficient procedure for significance determination in high-dimensional dependence learning based on surrogate data testing, termed inverse finite-size scaling (IFSS). The IFSS method is based on our discovery of a universal scaling property of random matrices which enables inference about signal behavior from much smaller scale surrogate data than the dimensionality of the original data. As a motivating example, we demonstrate the procedure for ultra-high-dimensional Potts models with order of 10^{10} parameters. IFSS reduces the computational effort of the data-testing procedure by several orders of magnitude, making it very efficient for practical purposes. This approach thus holds considerable potential for generalization to other types of complex models.

[1] Marcin J. Skwark,et al. Interacting networks of resistance, virulence and core machinery genes identified by genome-wide epistasis analysis , 2016, bioRxiv.

[2] T. Hwa,et al. Identification of direct residue contacts in protein–protein interaction by message passing , 2009, Proceedings of the National Academy of Sciences.

[3] R. Zecchina,et al. Inverse statistical problems: from the inverse Ising problem to data science , 2017, 1702.01522.

[4] P. Grassberger. Do climatic attractors exist? , 1986, Nature.

[5] W. M. Wood-Vasey,et al. LIKELIHOOD-FREE COSMOLOGICAL INFERENCE WITH TYPE Ia SUPERNOVAE: APPROXIMATE BAYESIAN COMPUTATION FOR A COMPLETE TREATMENT OF UNCERTAINTY , 2012, 1206.2563.

[6] Andrea Montanari,et al. Computational Implications of Reducing Data to Sufficient Statistics , 2014, ArXiv.

[7] Thomas A. Hopf,et al. Protein structure prediction from sequence variation , 2012, Nature Biotechnology.

[8] James Theiler,et al. Testing for nonlinearity in time series: the method of surrogate data , 1992 .

[9] E. Aurell,et al. Improved contact prediction in proteins: using pseudolikelihoods to infer Potts models. , 2012, Physical review. E, Statistical, nonlinear, and soft matter physics.

[10] Erik van Nimwegen,et al. Disentangling Direct from Indirect Co-Evolution of Residues in Protein Alignments , 2010, PLoS Comput. Biol..

[11] M. Gutmann,et al. Fundamentals and Recent Developments in Approximate Bayesian Computation , 2016, Systematic biology.

[12] Magnus Ekeberg,et al. Fast pseudolikelihood maximization for direct-coupling analysis of protein structure from many homologous amino-acid sequences , 2014, J. Comput. Phys..

[13] Jukka Corander,et al. SuperDCA for genome-wide epistasis analysis , 2017, bioRxiv.

[14] C. Sander,et al. Direct-coupling analysis of residue coevolution captures native contacts across many protein families , 2011, Proceedings of the National Academy of Sciences.

[15] Anthony N. Pettitt,et al. Bayesian indirect inference using a parametric auxiliary model , 2015, 1505.03372.

[16] E. E. O. Ishida,et al. cosmoabc: Likelihood-free inference via Population Monte Carlo Approximate Bayesian Computation , 2015, Astron. Comput..

[17] A. N. Pettitt,et al. Approximate Bayesian Computation for astronomical model analysis: a case study in galaxy demographics and morphological transformation at high redshift , 2012, 1202.1426.