Efficient In-Database Analytics with Graphical Models

Due to recent application push, there is high demand in industry to extend database systems to perform efficient and scalable in-database analytics based on probabilistic graphical models (PGMs). We discuss issues in supporting in-database PGM methods and present techniques to achieve a deep integration of the PGMmethods into the relational data model as well as the query processing and optimization engine. This is an active research area and the techniques discussed are being further developed and evaluated.

[1]  Matthew Richardson,et al.  Markov logic networks , 2006, Machine Learning.

[2]  MADden: query-driven statistical text analytics , 2012, CIKM '12.

[3]  Nir Friedman,et al.  Probabilistic Graphical Models - Principles and Techniques , 2009 .

[4]  Praveen Paritosh,et al.  Freebase: a collaboratively created graph database for structuring human knowledge , 2008, SIGMOD Conference.

[5]  Daisy Zhe Wang,et al.  Hybrid in-database inference for declarative information extraction , 2011, SIGMOD '11.

[6]  Joseph M. Hellerstein,et al.  MAD Skills: New Analysis Practices for Big Data , 2009, Proc. VLDB Endow..

[7]  Andrew McCallum,et al.  An Introduction to Conditional Random Fields for Relational Learning , 2007 .

[8]  Andrew McCallum,et al.  Query-Aware MCMC , 2011, NIPS.

[9]  Kun Li,et al.  The MADlib Analytics Library or MAD Skills, the SQL , 2012, Proc. VLDB Endow..

[10]  Pedro M. Domingos,et al.  Memory-Efficient Inference in Relational Domains , 2006, AAAI.

[11]  Daisy Zhe Wang,et al.  Knowledge expansion over probabilistic knowledge bases , 2014, SIGMOD Conference.

[12]  Joseph M. Hellerstein,et al.  GraphLab: A New Framework For Parallel Machine Learning , 2010, UAI.

[13]  Peter J. Haas,et al.  MCDB: a monte carlo approach to managing uncertain data , 2008, SIGMOD Conference.

[14]  Kun Li,et al.  GPText: Greenplum parallel statistical text analysis framework , 2013, DanaC '13.

[15]  Pedro M. Domingos,et al.  A General Method for Reducing the Complexity of Relational Inference and its Application to MCMC , 2008, AAAI.

[16]  Daisy Zhe Wang,et al.  Probabilistic declarative information extraction , 2010, 2010 IEEE 26th International Conference on Data Engineering (ICDE 2010).

[17]  Fabian M. Suchanek,et al.  Inside YAGO2s: a transparent information extraction architecture , 2013, WWW '13 Companion.

[18]  Raghu Ramakrishnan,et al.  Optimizing mpf queries: decision support and probabilistic inference , 2007, SIGMOD '07.

[19]  Matthew Richardson,et al.  The Alchemy System for Statistical Relational AI: User Manual , 2007 .

[20]  Oren Etzioni,et al.  Learning First-Order Horn Clauses from Web Text , 2010, EMNLP.

[21]  Sriraam Natarajan,et al.  Speeding Up Inference in Markov Logic Networks by Preprocessing to Reduce the Size of the Resulting Grounded Network , 2009, IJCAI.

[22]  Luis Gravano,et al.  Using q-grams in a DBMS for Approximate String Processing , 2001, IEEE Data Eng. Bull..

[23]  Andrew McCallum,et al.  Scalable probabilistic databases with factor graphs and MCMC , 2010, Proc. VLDB Endow..

[24]  Christopher Ré,et al.  Tuffy: Scaling up Statistical Inference in Markov Logic Networks using an RDBMS , 2011, Proc. VLDB Endow..

[25]  Michael J. Franklin,et al.  Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing , 2012, NSDI.

[26]  Oren Etzioni,et al.  Scaling Textual Inference to the Web , 2008, EMNLP.

[27]  Daisy Zhe Wang,et al.  BayesStore: managing large, uncertain data repositories with probabilistic graphical models , 2008, Proc. VLDB Endow..

[28]  Oren Etzioni,et al.  Identifying Functional Relations in Web Text , 2010, EMNLP.

[29]  Daisy Zhe Wang,et al.  Querying probabilistic information extraction , 2010, Proc. VLDB Endow..

[30]  Prithviraj Sen,et al.  Representing and Querying Correlated Tuples in Probabilistic Databases , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[31]  Hinrich Schütze,et al.  Book Reviews: Foundations of Statistical Natural Language Processing , 1999, CL.