Reasoning about Large Populations with Lifted Probabilistic Inference

We use a concrete problem in the context of planning meetings to show how lifted probabilistic inference can dramatically speed up reasoning. We also extend lifted inference to deal with cardinality potentials, and examine how to deal with background knowledge about a social network. Lifted inference: An example. Suppose that n people (say, n = 100) have been invited to a NIPS workshop, and we are wondering whether the attendees will overflow the 40-seat room we have reserved. A graphical model for this scenario is shown in Fig. 1(a). In this simple model, the attendance variables attend(pi) for each person pi are conditionally independent given the workshop’s popularity. We get noisy information about each person’s attendance based on the reply they have sent us: “yes”, “no”, or “no reply”. Assume for the moment that we just want to estimate the workshop’s popularity (ignoring the roomOverflow variable for now). In this case, we need to compute the marginal distribution for the popularity random variable given the reply variables. One commonly used algorithm, variable elimination (VE), computes this marginal by eliminating each attend(pi) variable in turn: for each pi, it first multiplies together the factors φ(attend(pi), reply(pi)) and φ(attend(pi), popularity), then sums out attend(pi) to get a factor on popularity alone. The resulting n factors on popularity are then multiplied together and normalized to yield a posterior distribution. The time required is linear in n. The basic insight of lifted inference algorithms such as first-order VE (FOVE) [3, 1] is that because this model treats the n invitees interchangeably, VE ends up doing the same multiplications and summations over and over. We can avoid this repeated work if we explicitly represent the interchangeability of entities. Fig. 1(b) shows how this is done. Instead of specifying factors for each person separately, we use parameterized factors or parfactors, where the random variables involved in the factor are parameterized by logical variables. In our case, we need just two parfactors φ(attend(P), reply(P)) and φ(attend(P), popularity), which apply to all people P. Given observations about people’s replies, the FOVE algorithm shatters each of these parfactors into three copies, one for the n+ people who said “yes”, one for the n− who said “no”, and one for the n0 people who did not reply. It then performs elimination just three times, once for each of these groups, rather than n times as before. This yields three factors on popularity, which we will call φ+, φ−, and φ0. The posterior distribution on popularity is now proportional to φ+(popularity)+ × φ−(popularity)− × φ0(popularity)0 . Assuming unit cost for exponentiation, this lifted algorithm takes constant time, removing the linear dependence on n. Cardinality potentials. Now consider our original goal of predicting whether the attendees will overflow a 40-seat room. For this purpose, we can query the roomOverflow variable in Fig. 1, which deterministically indicates whether more than 40 attend(pi) variables are true. The first difficulty here is that a tabular representation of the factor linking roomOverflow to all the attend(pi) variables would require space exponential in n. Although more compact representations for such factors have been developed [4], the FOVE algorithm [1] does not exploit them. reply(pN) reply(p2) reply(p1) ... attend(pN) popularity attend(p2) attend(p1)