Statistical properties of interaction parameter estimates in direct coupling analysis
暂无分享,去创建一个
We consider the statistical properties of interaction parameter estimates obtained by the direct coupling analysis (DCA) approach to learning interactions from large data sets. Assuming that the data are generated from a random background distribution, we determine the distribution of inferred interactions. Two inference methods are considered: the L2 regularized naive mean-field inference procedure (regularized least squares, RLS), and the pseudo-likelihood maximization (plmDCA). For RLS we also study a model where the data matrix elements are real numbers, identically and independently generated from a Gaussian distribution; in this setting we analytically find that the distribution of the inferred interactions is Gaussian. For data of Boolean type, more realistic in practice, the inferred interactions do not generally follow a Gaussian. However, extensive numerical simulations indicate that their distribution can be characterized by a single function determined by a few system parameters after normalization by the standard deviation. This property holds for both RLS and plmDCA and may be exploitable for inferring the distribution of extremely large interactions from simulations for smaller system sizes.
[1] W. Marsden. I and J , 2012 .
[2] R. Zecchina,et al. Inverse statistical problems: from the inverse Ising problem to data science , 2017, 1702.01522.
[3] Rappold,et al. Human Molecular Genetics , 1996, Nature Medicine.
[4] A. Sayed,et al. Foundations and Trends ® in Machine Learning > Vol 7 > Issue 4-5 Ordering Info About Us Alerts Contact Help Log in Adaptation , Learning , and Optimization over Networks , 2011 .