Scaling up Inference in MLNs with Spark

Typically, inference algorithms for big data address non-relational data. However, clearly, a lot of real-world data such as social network data, healthcare data, etc. are relational in nature. Therefore, we need more powerful techniques that can scale up richer inference algorithms on relational data. Markov Logic Networks (MLNs) are arguably one of the most popular statistical relational models that can represent complex, uncertain knowledge succinctly. In this paper, we scale up inference algorithms for MLNs to big relational data. Specifically, the probabilistic graphical model underlying an MLN is typically extremely large even for small-sized problems, and performing inference on this model is highly challenging. A pre-dominant approach that is used to improve scalability is to perform lifted inference that does not construct the full graphical model underlying the MLN. Instead, the idea in lifted inference is to use symmetries in the distribution to reduce the size of the model, thus improving scalability. A popular approach to perform lifting utilizes clustering techniques to group together variables with similar distributional characteristics. However, for big relational data, it quickly becomes infeasible to identify these symmetries scalably. In this paper, we design a novel lifted inference system built on top of Spark that takes advantage of parallelism to identify symmetries in the MLN. Thus our work unifies advances in inference for relational data with advances in big data processing technologies. Utilizing the power of Spark, we show that we can perform more accurate inference and scale up relational inference to orders of magnitude larger sized datasets than currently possible by state-of-the-art MLN systems.

[1]  Kristian Kersting,et al.  Efficient Lifting of MAP LP Relaxations Using k-Locality , 2014, AISTATS.

[2]  Vibhav Gogate,et al.  Advances in Lifted Importance Sampling , 2012, AAAI.

[3]  Joseph M. Hellerstein,et al.  GraphLab: A New Framework For Parallel Machine Learning , 2010, UAI.

[4]  Pedro M. Domingos,et al.  Approximate Lifting Techniques for Belief Propagation , 2014, AAAI.

[5]  Vibhav Gogate,et al.  Evidence-Based Clustering for Scalable Inference in Markov Logic , 2014, ECML/PKDD.

[6]  Somdeb Sarkhel,et al.  Learning Mixtures of MLNs , 2018, AAAI.

[7]  Scott Shenker,et al.  Spark: Cluster Computing with Working Sets , 2010, HotCloud.

[8]  Somdeb Sarkhel,et al.  Lifted MAP Inference for Markov Logic Networks , 2014, AISTATS.

[9]  Matthew Richardson,et al.  The Alchemy System for Statistical Relational AI: User Manual , 2007 .

[10]  Somdeb Sarkhel,et al.  Fast Lifted MAP Inference via Partitioning , 2015, NIPS.

[11]  Guy Van den Broeck On the Complexity and Approximation of Binary Evidence in Lifted Inference , 2013, StarAI@AAAI.

[12]  Luciano Del Corro,et al.  Fully Parallel Inference in Markov Logic Networks , 2013, BTW.

[13]  Guy Van den Broeck,et al.  Tractability through Exchangeability: A New Perspective on Efficient Probabilistic Inference , 2014, AAAI.

[14]  Andrew McCallum,et al.  Introduction to Statistical Relational Learning , 2007 .

[15]  Vibhav Gogate,et al.  On Lifting the Gibbs Sampling Algorithm , 2012, StarAI@UAI.

[16]  Vibhav Gogate,et al.  Scaling-up Importance Sampling for Markov Logic Networks , 2014, NIPS.

[17]  Pedro M. Domingos,et al.  Markov Logic: An Interface Layer for Artificial Intelligence , 2009, Markov Logic: An Interface Layer for Artificial Intelligence.

[18]  Kristian Kersting,et al.  Counting Belief Propagation , 2009, UAI.

[19]  Luc De Raedt,et al.  Lifted Probabilistic Inference by First-Order Knowledge Compilation , 2011, IJCAI.

[20]  Pedro M. Domingos,et al.  Probabilistic theorem proving , 2011, UAI.

[21]  Nicholas Ruozzi,et al.  Efficient Inference for Untied MLNs , 2017, IJCAI.

[22]  David Poole,et al.  First-order probabilistic inference , 2003, IJCAI.

[23]  Larry S. Davis,et al.  Event Modeling and Recognition Using Markov Logic Networks , 2008, ECCV.

[24]  Chen Chen,et al.  Relieving the Computational Bottleneck: Joint Inference for Event Extraction with High-Dimensional Features , 2014, EMNLP.

[25]  Somdeb Sarkhel,et al.  Just Count the Satisfied Groundings: Scalable Local-Search and Sampling Based Inference in MLNs , 2015, AAAI.

[26]  Kristian Kersting,et al.  MapReduce Lifting for Belief Propagation , 2013, StarAI@AAAI.

[27]  Dan Roth,et al.  Lifted First-Order Probabilistic Inference , 2005, IJCAI.

[28]  Lise Getoor,et al.  Probabilistic Visitor Stitching on Cross-Device Web Logs , 2017, WWW.

[29]  Pedro M. Domingos,et al.  Lifted First-Order Belief Propagation , 2008, AAAI.

[30]  Christopher Ré,et al.  Tuffy: Scaling up Statistical Inference in Markov Logic Networks using an RDBMS , 2011, Proc. VLDB Endow..

[31]  Stephen P. Boyd,et al.  Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers , 2011, Found. Trends Mach. Learn..