Fast inference of ill-posed problems within a convex space

In multiple scientific and technological applications we face the problem of having low dimensional data to be justified by a linear model defined in a high dimensional parameter space. The difference in dimensionality makes the problem ill-defined: the model is consistent with the data for many values of its parameters. The objective is to find the probability distribution of parameter values consistent with the data, a problem that can be cast as the exploration of a high dimensional convex polytope. In this work we introduce a novel algorithm to solve this problem efficiently. It provides results that are statistically indistinguishable from currently used numerical techniques while its running time scales linearly with the system size. We show that the algorithm performs robustly in many abstract and practical applications. As working examples we simulate the effects of restricting reaction fluxes on the space of feasible phenotypes of a {\em genome} scale E. Coli metabolic network and infer the traffic flow between origin and destination nodes in a real communication network.

[1]  Benjamin Haibe-Kains,et al.  Relevance of different prior knowledge sources for inferring gene interaction networks , 2014, Front. Genet..

[2]  P. Erdos,et al.  On the evolution of random graphs , 1984 .

[3]  Matthew Roughan,et al.  Computation of IP traffic from link , 2003, SIGMETRICS 2003.

[4]  Albert-László Barabási,et al.  Statistical mechanics of complex networks , 2001, ArXiv.

[5]  Francesco Alessandro Massucci,et al.  A weighted belief-propagation algorithm for estimating volume-related properties of random polytopes , 2012, 1208.1295.

[6]  L. Shepp,et al.  A Statistical Model for Positron Emission Tomography , 1985 .

[7]  Martin E. Dyer,et al.  On the Complexity of Computing the Volume of a Polyhedron , 1988, SIAM J. Comput..

[8]  Francesco Alessandro Massucci,et al.  A Novel Methodology to Estimate Metabolic Flux Distributions in Constraint-Based Models , 2013, Metabolites.

[9]  Peter Beerli,et al.  Comparison of Bayesian and maximum-likelihood inference of population genetic parameters , 2006, Bioinform..

[10]  Kavé Salamatian,et al.  Traffic matrix estimation: existing techniques and new directions , 2002, SIGCOMM '02.

[11]  R. Chellappa Introduction of New Editor-in-Chief , 2005 .

[12]  Robert L. Smith,et al.  Efficient Monte Carlo Procedures for Generating Points Uniformly Distributed over Bounded Regions , 1984, Oper. Res..

[13]  Narayan C. Giri,et al.  On approximations involving the beta distribution , 1995 .

[14]  Benjamin M. W. Tsui,et al.  Simulation evaluation of Gibbs prior distributions for use in maximum a posteriori SPECT reconstructions , 1992, IEEE Trans. Medical Imaging.

[15]  Matteo Mori,et al.  Uniform Sampling of Steady States in Metabolic Networks: Heterogeneous Scales and Rounding , 2013, PloS one.

[16]  Michael Elad,et al.  Optimally sparse representation in general (nonorthogonal) dictionaries via ℓ1 minimization , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[17]  Konstantina Papagiannaki,et al.  Structural analysis of network traffic flows , 2004, SIGMETRICS '04/Performance '04.

[18]  Michael E Goddard,et al.  Sensitivity of genomic selection to using different prior distributions , 2010, BMC proceedings.

[19]  Carsten Lund,et al.  An information-theoretic approach to traffic matrix estimation , 2003, SIGCOMM '03.

[20]  Eric D. Kolaczyk,et al.  Statistical Analysis of Network Data , 2009 .

[21]  M. Mézard,et al.  Random K-satisfiability problem: from an analytic solution to an efficient algorithm. , 2002, Physical review. E, Statistical, nonlinear, and soft matter physics.

[22]  Brendan J. Frey,et al.  A comparison of algorithms for inference and learning in probabilistic graphical models , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[23]  A. Werhli Comparing the reconstruction of regulatory pathways with distinct Bayesian networks inference methods , 2012, BMC Genomics.

[24]  S. Kak Information, physics, and computation , 1996 .

[25]  H. J. Greenberg,et al.  Monte Carlo sampling can be used to determine the size and shape of the steady-state flux space. , 2004, Journal of theoretical biology.

[26]  Alfredo Braunstein,et al.  Estimating the size of the solution space of metabolic networks , 2007, BMC Bioinformatics.

[27]  Albert G. Greenberg,et al.  Fast accurate computation of large-scale IP traffic matrices from link loads , 2003, SIGMETRICS '03.

[28]  B. Palsson Systems Biology: Properties of Reconstructed Networks , 2006 .

[29]  Y. Vardi,et al.  Network Tomography: Estimating Source-Destination Traffic Intensities from Link Data , 1996 .

[30]  Michael Herbst,et al.  UvA-DARE ( Digital Academic Repository ) Inverse modelling of in situ soil water dynamics : investigating the effect of different prior distributions of the soil hydraulic parameters , 2011 .

[31]  Adam M. Feist,et al.  A comprehensive genome-scale reconstruction of Escherichia coli metabolism—2011 , 2011, Molecular systems biology.

[32]  Florent Krzakala,et al.  Statistical physics-based reconstruction in compressed sensing , 2011, ArXiv.

[33]  Jonathan Coles,et al.  A sampling strategy for high-dimensional spaces applied to free-form gravitational lensing , 2012, 1207.1722.

[34]  Emmanuel J. Candès,et al.  Robust uncertainty principles: exact signal reconstruction from highly incomplete frequency information , 2004, IEEE Transactions on Information Theory.

[35]  Jan Schellenberger,et al.  Use of Randomized Sampling for Analysis of Metabolic Networks* , 2009, Journal of Biological Chemistry.

[36]  David Avis,et al.  A pivoting algorithm for convex hulls and vertex enumeration of arrangements and polyhedra , 1992, Discret. Comput. Geom..

[37]  V. Turchin On the Computation of Multidimensional Integrals by the Monte-Carlo Method , 1971 .

[38]  B. Bollobás The evolution of random graphs , 1984 .

[39]  Kenneth Steiglitz,et al.  Combinatorial Optimization: Algorithms and Complexity , 1981 .

[40]  B. Palsson,et al.  Uniform sampling of steady-state flux spaces: means to design experiments and to interpret enzymopathies. , 2004, Biophysical journal.

[41]  R. Zecchina,et al.  Polynomial iterative algorithms for coloring and analyzing random graphs. , 2003, Physical review. E, Statistical, nonlinear, and soft matter physics.

[42]  W. Krauth,et al.  Sampling from a polytope and hard-disk Monte Carlo , 2013, 1301.4901.

[43]  Robert L. Smith,et al.  Hit-and-Run Algorithms for Generating Multivariate Distributions , 1993, Math. Oper. Res..

[44]  Komei Fukuda,et al.  Exact volume computation for polytopes: a practical study , 1996 .

[45]  J. Kruskal,et al.  COMPUTERIZED TOMOGRAPHY: THE NEW MEDICAL X-RAY TECHNOLOGY , 1978 .

[46]  Brendan J. Frey,et al.  Factor graphs and the sum-product algorithm , 2001, IEEE Trans. Inf. Theory.

[47]  Rüdiger L. Urbanke,et al.  The capacity of low-density parity-check codes under message-passing decoding , 2001, IEEE Trans. Inf. Theory.

[48]  Ronan M. T. Fleming,et al.  Reconstruction and Use of Microbial Metabolic Networks: the Core Escherichia coli Metabolic Model as an Educational Guide. , 2010, EcoSal Plus.