Conveying information about who, what, when and where is a primary purpose of some genres of documents, typically news articles. To handle such information, statistical models that capture dependencies between named entities and topics can serve an important role. Although some relationships between who and where should be mentioned in such a document, no statistical topic models explicitly addressed the textual interactions between a who-entity and a where-entity. This paper presents a statistical model that directly captures dependencies between an arbitrary number of word types, such as who-entities, where-entities and topics, mentioned in each document. We show how this multitype topic model performs better at making predictions on entity networks, in which each vertex represents an entity and each edge weight represents how a pair of entities at the incident vertices is closely related, through our experiments on predictions of who-entities and links between them.
[1]
Michael I. Jordan,et al.
Latent Dirichlet Allocation
,
2001,
J. Mach. Learn. Res..
[2]
Richard M. Schwartz,et al.
An Algorithm that Learns What's in a Name
,
1999,
Machine Learning.
[3]
Padhraic Smyth,et al.
Statistical entity-topic models
,
2006,
KDD '06.
[4]
Stephen E. Robertson,et al.
On GMAP: and other transformations
,
2006,
CIKM '06.
[5]
Mark Steyvers,et al.
Finding scientific topics
,
2004,
Proceedings of the National Academy of Sciences of the United States of America.
[6]
W. Bruce Croft,et al.
The INQUERY Retrieval System
,
1992,
DEXA.
[7]
Naonori Ueda,et al.
Parametric Mixture Models for Multi-Labeled Text
,
2002,
NIPS.