论文信息 - Solving diversified top-k weight clique search problem

Solving diversified top-k weight clique search problem

Diversified top-k weight clique search (DTKWCS) is a problem that computes k cliques to maximize the sum of weights of all vertices contained in the cliques; that is, Σv∈{c1∪c2∪···∪ck}w(v) is maximized by giving a weighted graph G and an integer k, where ci is one of the k cliques, and w(v) is the weight of the vertex v in G. This problem is NP-hard. It can be applied in spectrum sharing problem, advertisement putting problem, gene expression and motif discovery, influential community search, sensor place problem, and anomaly detection in complex networks [1–4]. In solving DTKWCS in an unweighted graph, a trivial direct approach based on all cliques enumeration is used [5]; however, the approach is time-consuming and not suitable for solving large graphs. Another direct-solving approach is proposed that can give approximate solutions [6]; however, the approach is not competitive in solving dense graphs and cannot guarantee the optimality of its solutions. Therefore, it is worth exploring a generic approach to solving DTKWCS. In this study, we provide a generic approach for solving DTKWCS, which is done by encoding the DTKWCS into the weighted partial MaxSAT (WPMS) problem and then solving WPMS with state-of-the-art solvers. It has been proven that solving NP problems, including academic and industrial problems, by encoding as SAT or WPMS is an efficient strategy [7]. To perform the encoding of DTKWCS to WPMS, we present two encodings strategies: direct encoding (DE) and independent set partition based encoding (ISPE). As shown in the supporting information, the experimental results show that the two encoding strategies are competitive. Preliminaries. G = 〈V,E,w〉 is an undirected weighted graph, where V and E are sets of vertices and edges, respectively, and w is a weight function that assigns a nonnegative integer, called weight, to each vertex v. A clique ci of a graph G is a subset of vertices in G such that every two distinct vertices in the subset are adjacent. A literal is either a Boolean variable (variable for brevity in the rest of study) x or its negation ¬x. A clause is a disjunction of literals, which is satisfied if and only if at least one literal in it taking the value true. A weighted clause is a pair (c, w), where c is a clause, and w is the weight of the clause. A weighted clause is hard if its weight is infinite; otherwise, the clause is soft. A WPMS formula F in CNF is a conjunction of hard and soft clauses. The purpose of the WPMS problem is to find a truth assignment for F by satisfying all hard clauses and then maximizing the sum of weights of all satisfied soft clauses. Direct encoding. The basic idea of the DE is derived from the following observations. First, because the DTKWCS problem requires to find k cliques, we encode each vertex into k variables; that is, the vertex vi is expanded into the variables xi1, xi2, . . . , xik. Thus, the variable xij = true if and only if the vertex vi is in the jth clique. Second, the DTKWCS and WPMS problems are both used to compute a solution to maximize the sum of weights of vertices (or soft clauses). Then, DE encoding creates hard clauses that could guarantee every feasible solution of a WPMS instance to form k cliques. Finally, the DE encoding employs a direct way to encode soft clauses; that is, each vertex vi defines a soft clause, which is satisfied if and only if vi is in at least one of the k cliques. Formally, given a graph G = 〈V,E,w〉 and an integer k, we define the DE encoding as follows. (1) For each vi ∈ V , create k variables xi1, xi2, . . . , xik. (2) For any two unconnected vertices vi and vj in V (i.e., 〈vi, vj〉 / ∈ E), create k hard clauses: (¬xi1 ∨ ¬xj1,∞), (¬xi2 ∨ ¬xj2,∞), . . . , (¬xik ∨ ¬xjk ,∞). (3) For each vertex vi ∈ V , create a soft clause (xi1∨ xi2 ∨ · · · ∨ xik, w(vi)). We denote the resulting WPMS formula by φ. The DE encoding has the following properties.

[1] Xiaoqi Zheng,et al. Large cliques in Arabidopsis gene coexpression network and motif discovery. , 2011, Journal of plant physiology.

[2] Fergal Reid,et al. Detecting highly overlapping community structure by greedy clique expansion , 2010, KDD 2010.

[3] Carlos Ansótegui,et al. MaxSAT Evaluation 2017: Solver and Benchmark Descriptions , 2017 .

[4] Lijun Chang,et al. Diversified top-k clique search , 2015, The VLDB Journal.

[5] Zhu Zhu,et al. Exact MinSAT Solving , 2010, SAT.

[6] Alexey Ignatiev,et al. RC2: an Efficient MaxSAT Solver , 2019, J. Satisf. Boolean Model. Comput..

[7] Andreas Krause,et al. Near-optimal Observation Selection using Submodular Functions , 2007, AAAI.