Relational Learning and Feature Extraction by Querying over Heterogeneous Information Networks

Many real world systems need to operate on heterogeneous information networks that consist of numerous interacting components of different types. Examples include systems that perform data analysis on biological information networks; social networks; and information extraction systems processing unstructured data to convert raw text to knowledge graphs. Many previous works describe specialized approaches to perform specific types of analysis, mining and learning on such networks. In this work, we propose a unified framework consisting of a data model -a graph with a first order schema along with a declarative language for constructing, querying and manipulating such networks in ways that facilitate relational and structured machine learning. In particular, we provide an initial prototype for a relational and graph traversal query language where queries are directly used as relational features for structured machine learning models. Feature extraction is performed by making declarative graph traversal queries. Learning and inference models can directly operate on this relational representation and augment it with new data and knowledge that, in turn, is integrated seamlessly into the relational structure to support new predictions. We demonstrate this system's capabilities by showcasing tasks in natural language processing and computational biology domains.

[1]  Dan Roth,et al.  On Kernel Methods for Relational Learning , 2003, ICML.

[2]  Yizhou Sun,et al.  Mining heterogeneous information networks: a structural analysis approach , 2013, SKDD.

[3]  Matthew Richardson,et al.  Markov logic networks , 2006, Machine Learning.

[4]  Bartosz Broda,et al.  Fextor: A Feature Extraction Framework for Natural Language Processing: A Case Study in Word Sense Disambiguation, Relation Recognition and Anaphora Resolution , 2013, Computational Linguistics - Applications.

[5]  Diego Calvanese,et al.  The Description Logic Handbook: Theory, Implementation, and Applications , 2003, Description Logic Handbook.

[6]  Frederick Reiss,et al.  SystemT: a system for declarative information extraction , 2009, SGMD.

[7]  Ian H. Witten,et al.  Weka: Practical machine learning tools and techniques with Java implementations , 1999 .

[8]  Luc De Raedt,et al.  kLog: A Language for Logical and Relational Learning with Kernels (Extended Abstract) , 2012, IJCAI.

[9]  Philip S. Yu,et al.  A Survey of Heterogeneous Information Network Analysis , 2015, IEEE Transactions on Knowledge and Data Engineering.

[10]  Parisa Kordjamshidi,et al.  Saul: Towards Declarative Learning Based Programming , 2015, IJCAI.

[11]  Vivek Srikumar,et al.  WOLFE: Strength Reduction and Approximate Programming for Probabilistic Programming , 2014, AAAI Workshop: Statistical Relational Artificial Intelligence.

[12]  Steven Bird,et al.  NLTK: The Natural Language Toolkit , 2002, ACL.

[13]  Reynold Xin,et al.  GraphX: Graph Processing in a Distributed Dataflow Framework , 2014, OSDI.

[14]  Lokendra Shastri,et al.  Why Semantic Networks? , 1991, Principles of Semantic Networks.

[15]  Kalina Bontcheva,et al.  Text Processing with GATE , 2011 .

[16]  Ming-Wei Chang,et al.  Structured learning with constrained conditional models , 2012, Machine Learning.

[17]  Evgeniy Gabrilovich,et al.  A Review of Relational Machine Learning for Knowledge Graphs , 2015, Proceedings of the IEEE.

[18]  P ? ? ? ? ? ? ? % ? ? ? ? , 1991 .

[19]  David A. Ferrucci,et al.  UIMA: an architectural approach to unstructured information processing in the corporate research environment , 2004, Natural Language Engineering.

[20]  Andrew McCallum,et al.  FACTORIE: Probabilistic Programming via Imperatively Defined Factor Graphs , 2009, NIPS.

[21]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[22]  Parisa Kordjamshidi,et al.  Better call Saul: Flexible Programming for Learning and Inference in NLP , 2016, COLING.

[23]  Dan Roth,et al.  Relational Representations that Facilitate Learning , 1999, KR.

[24]  Jiawei Han,et al.  Query-Based Outlier Detection in Heterogeneous Information Networks , 2015, EDBT.

[26]  Ambuj K. Singh,et al.  Graphs-at-a-time: query language and access methods for graph databases , 2008, SIGMOD Conference.

[27]  Gary D Bader,et al.  Pathway and network analysis of cancer genomes , 2015, Nature Methods.

[28]  Parisa Kordjamshidi,et al.  EDISON: Feature Extraction for NLP, Simplified , 2016, LREC.