Constructing Topical Hierarchies in Heterogeneous Information Networks

A digital data collection (e.g., scientific publications, enterprise reports, news, and social media) can often be modeled as a heterogeneous information network, linking text with multiple types of entities. Constructing high-quality concept hierarchies that can represent topics at multiple granularities benefits tasks such as search, information browsing, and pattern mining. In this work we present an algorithm for recursively constructing multi-typed topical hierarchies. Contrary to traditional text-based topic modeling, our approach handles both textual phrases and multiple types of entities by a newly designed clustering and ranking algorithm for heterogeneous network data, as well as mining and ranking topical patterns of different types. Our experiments on datasets from two different domains demonstrate that our algorithm yields high quality, multi-typed topical hierarchies.

[1]  Alexander Pretschner,et al.  Ontology-based personalized search and browsing , 2003, Web Intell. Agent Syst..

[2]  Heng Ji,et al.  Joint Event Extraction via Structured Prediction with Global Features , 2013, ACL.

[3]  Yinan Zhang,et al.  A phrase mining framework for recursive construction of a topical hierarchy , 2013, KDD.

[4]  Wlodzimierz Drabent,et al.  Extending XML Query Language Xcerpt by Ontology Queries , 2007 .

[5]  Yizhou Sun,et al.  ETM: Entity Topic Models for Mining Documents Associated with Entities , 2012, 2012 IEEE 12th International Conference on Data Mining.

[6]  Timothy Baldwin,et al.  Automatic Evaluation of Topic Coherence , 2010, NAACL.

[7]  Padhraic Smyth,et al.  Model selection for probabilistic clustering using cross-validated likelihood , 2000, Stat. Comput..

[8]  Luigi Di Caro,et al.  Using tagflake for condensing navigable tag hierarchies from tag clouds , 2008, KDD.

[9]  Thomas Hofmann,et al.  Unsupervised Learning by Probabilistic Latent Semantic Analysis , 2004, Machine Learning.

[10]  Bo Zhao,et al.  Probabilistic topic models with biased propagation on heterogeneous information networks , 2011, KDD.

[11]  George A. Vouros,et al.  Discovering Subsumption Hierarchies of Ontology Concepts from Text Corpora , 2007 .

[12]  Haixun Wang,et al.  Automatic taxonomy construction from keywords , 2012, KDD.

[13]  Qiaozhu Mei,et al.  One theme in all views: modeling consensus topics in multiple contexts , 2013, KDD.

[14]  Stefano Faralli,et al.  A Graph-Based Algorithm for Inducing Lexical Taxonomies from Scratch , 2011, IJCAI.

[15]  W. Bruce Croft,et al.  Discovering and Comparing Topic Hierarchies , 2000, RIAO.

[16]  Yizhou Sun,et al.  iTopicModel: Information Network-Integrated Topic Modeling , 2009, 2009 Ninth IEEE International Conference on Data Mining.

[17]  Yizhou Sun,et al.  Ranking-based clustering of heterogeneous information networks with star network schema , 2009, KDD.

[18]  Jian Pei,et al.  Mining Frequent Patterns without Candidate Generation: A Frequent-Pattern Tree Approach , 2006, Sixth IEEE International Conference on Data Mining - Workshops (ICDMW'06).

[19]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[20]  Mohammed Bennamoun,et al.  Ontology learning from text: A look back and into the future , 2012, CSUR.

[21]  Xu Chen,et al.  The contextual focused topic model , 2012, KDD.

[22]  Shui-Lung Chuang,et al.  A practical web-based approach to generating topic hierarchy for text segments , 2004, CIKM '04.

[23]  Daniel Jurafsky,et al.  Learning Syntactic Patterns for Automatic Hypernym Discovery , 2004, NIPS.

[24]  Chong Wang,et al.  Reading Tea Leaves: How Humans Interpret Topic Models , 2009, NIPS.

[25]  George A. Vouros,et al.  Discovering Subsumption Hierarchies of Ontology Concepts from Text Corpora , 2007, IEEE/WIC/ACM International Conference on Web Intelligence (WI'07).