Learning Hierarchical Relationships among Partially Ordered Objects with Heterogeneous Attributes and Links

Objects linking with many other objects in an information network may imply various semantic relationships. Uncovering such knowledge is essential for role discovery, data cleaning, and better organization of information networks, especially when the semantically meaningful relationships are hidden or mingled with noisy links and attributes. In this paper we study a generic form of relationship along which objects can form a treelike structure, a pervasive structure in various domains. We formalize the problem of uncovering hierarchical relationships in a supervised setting. In general, local features of object attributes, their interaction patterns, as well as rules and constraints for knowledge propagation can be used to infer such relationships. Existing approaches, designed for specific applications, either cannot handle dependency rules together with local features, or cannot leverage labeled data to differentiate their importance. In this study, we propose a discriminative undirected graphical model. It integrates a wide range of features and rules by defining potential functions with simple forms. These functions are also summarized and categorized. Our experiments on three quite different domains demonstrate how to apply the method to encode domain knowledge. The efficacy is measured with both traditional and our newly designed metrics in the evaluation of discovered tree structures.

[1]  Xiang Li,et al.  Joint inference for cross-document information extraction , 2011, CIKM '11.

[2]  ChengXiang Zhai,et al.  Learning online discussion structures by conditional random fields , 2011, SIGIR.

[3]  Thomas Hofmann,et al.  Support vector machine learning for interdependent and structured output spaces , 2004, ICML.

[4]  Shlomo Hershkop,et al.  Automated social hierarchy detection through email network analysis , 2007, WebKDD/SNA-KDD '07.

[5]  Fan Zhang,et al.  Nonlinear Evidence Fusion and Propagation for Hyponymy Relation Mining , 2011, ACL.

[6]  Dan Roth,et al.  Probabilistic Reasoning for Entity & Relation Recognition , 2002, COLING.

[7]  Jiawei Han,et al.  Mining advisor-advisee relationships from research publication networks , 2010, KDD.

[8]  Ian McGraw,et al.  Residual Belief Propagation: Informed Scheduling for Asynchronous Message Passing , 2006, UAI.

[9]  Jure Leskovec,et al.  Inferring networks of diffusion and influence , 2010, KDD.

[10]  Robert P. Cook,et al.  Freebase: A Shared Database of Structured General Human Knowledge , 2007, AAAI.

[11]  Heng Ji,et al.  Knowledge Base Population: Successful Approaches and Challenges , 2011, ACL.

[12]  W. Bruce Croft,et al.  Online community search using thread structure , 2009, CIKM.

[13]  Andrew McCallum,et al.  Topic and Role Discovery in Social Networks with Experiments on Enron and Academic Email , 2007, J. Artif. Intell. Res..

[14]  Wei-Ying Ma,et al.  2D Conditional Random Fields for Web information extraction , 2005, ICML.

[15]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[16]  Jorge Nocedal,et al.  Representations of quasi-Newton matrices and their use in limited memory methods , 1994, Math. Program..

[17]  Lise Getoor,et al.  Relationship Identification for Social Network Discovery , 2007, AAAI.

[18]  J. M. Hammersley,et al.  Markov fields on finite graphs and lattices , 1971 .

[19]  Philip Bille,et al.  A survey on tree edit distance and related problems , 2005, Theor. Comput. Sci..

[20]  M. Newman,et al.  Hierarchical structure and the prediction of missing links in networks , 2008, Nature.

[21]  Ben Taskar,et al.  Max-Margin Markov Networks , 2003, NIPS.

[22]  Joseph Naor,et al.  Approximating Minimum Feedback Sets and Multicuts in Directed Graphs , 1998, Algorithmica.

[23]  Jure Leskovec,et al.  Meme-tracking and the dynamics of the news cycle , 2009, KDD.

[24]  Andrew McCallum,et al.  Integrating Probabilistic Extraction Models and Data Mining to Discover Relations and Patterns in Text , 2006, NAACL.

[25]  Luigi Di Caro,et al.  Using tagflake for condensing navigable tag hierarchies from tag clouds , 2008, KDD.

[26]  Tanya Y. Berger-Wolf,et al.  Inferring the Maximum Likelihood Hierarchy in Social Networks , 2009, 2009 International Conference on Computational Science and Engineering.

[27]  William T. Freeman,et al.  On the optimality of solutions of the max-product belief-propagation algorithm in arbitrary graphs , 2001, IEEE Trans. Inf. Theory.

[28]  Linton C. Freeman,et al.  Uncovering Organizational Hierarchies , 1997, Comput. Math. Organ. Theory.

[29]  Martin J. Wainwright,et al.  Tree-based reparameterization for approximate inference on loopy graphs , 2001, NIPS.

[30]  Xiaoxin Yin,et al.  Building taxonomy of web search intents for name entity queries , 2010, WWW '10.

[31]  Charles Kemp,et al.  The discovery of structural form , 2008, Proceedings of the National Academy of Sciences.

[32]  Luis Gravano,et al.  Snowball: extracting relations from large plain-text collections , 2000, DL '00.

[33]  Xiang Li,et al.  CUNY-BLENDER TAC-KBP2010 Entity Linking and Slot Filling System Description , 2010, TAC.

[34]  Brendan J. Frey,et al.  Graphical Models for Machine Learning and Digital Communication , 1998 .

[35]  Jian Su,et al.  Exploring Various Knowledge in Relation Extraction , 2005, ACL.