Topical text network construction based on seed word augmentation

A topical text network is helpful when analyzing a large corpus of documents, for it can give an intuitive insight of the topic distribution, including topic words and their connections. This paper proposes a topical text network construction method based on seed word augmentation. Firstly, we manually select some representative seed words for each topic, and then these words are augmented by some similarity metric. Secondly, by defining a threshold of similarity, the similar topic words are connected to construct a topical text network. We construct a topical network on aviation safety reports to analyze human factors using this method, and find the network a scale-free and small-world network with significant modularity.

[1]  Vincent Ng,et al.  Improving Cause Detection Systems with Active Learning , 2010, CIDU.

[2]  William Ribarsky,et al.  HierarchicalTopics: Visually Exploring Large Text Collections Using Topic Hierarchies , 2013, IEEE Transactions on Visualization and Computer Graphics.

[3]  Vincent Ng,et al.  Semi-Supervised Cause Identification from Aviation Safety Reports , 2009, ACL.

[4]  Martin F. Porter,et al.  An algorithm for suffix stripping , 1997, Program.

[5]  M. F. Porter,et al.  An algorithm for suffix stripping , 1997 .

[6]  Sune Lehmann,et al.  Link communities reveal multiscale complexity in networks , 2009, Nature.

[7]  Duncan J. Watts,et al.  Collective dynamics of ‘small-world’ networks , 1998, Nature.

[8]  O Mason,et al.  Graph theory and networks in Biology. , 2006, IET systems biology.

[9]  Jordan Boyd-Graber,et al.  Concurrent Visualization of Relationships between Words and Topics in Topic Models , 2014 .

[10]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[11]  Jeffrey Heer,et al.  Termite: visualization techniques for assessing textual topic models , 2012, AVI.

[12]  Albert-László Barabási,et al.  A Dynamic Network Approach for the Study of Human Phenotypes , 2009, PLoS Comput. Biol..

[13]  Maria Francesca Costabile,et al.  Proceedings of the International Working Conference on Advanced Visual Interfaces , 2016, AVI 2016.

[14]  Thomas Hofmann,et al.  Unsupervised Learning by Probabilistic Latent Semantic Analysis , 2004, Machine Learning.

[15]  Ramesh Nallapati,et al.  Labeled LDA: A supervised topic model for credit attribution in multi-labeled corpora , 2009, EMNLP.