Tracking Events Using Time-dependent Hierarchical Dirichlet Tree Model

Timeline Generation, through generating news timelines from the massive data of news corpus, aims at providing readers with summaries about the evolvement of an event. It is a new challenge of summarization that combines salience ranking with novelty detection. For a long-term public event, the main topic usually includes many different sub-topics at varying epochs, which also has its own evolving patterns. Existing approaches fail to utilize such hierarchical topic structure involved in the news corpus for timeline generation . In this paper, we develop a novel time-dependent Hierarchical Dirichlet Tree Model (tHDT) for timeline generation. Our model can aptly detect different levels of topic information in corpus and the structure is further used for sentence selection. Based on the topic distribution mined from tHDT, sentences are selected through an overall consideration of relevance, coherence and coverage. We develop experimental systems to compare different rival algorithms on 8 long-term events of public concern. The performance comparison demonstrates the effectiveness of our proposed model in terms of ROUGE metrics.

[1]  Thorsten Brants,et al.  A System for new event detection , 2003, SIGIR.

[2]  E. Xing,et al.  Dynamic Non-Parametric Mixture Models and The Recurrent Chinese Restaurant Process a , 2008 .

[3]  Yan Zhang,et al.  Timeline Generation through Evolutionary Trans-Temporal Summarization , 2011, EMNLP.

[4]  Tao Li,et al.  Generating Pictorial Storylines Via Minimum-Weight Connected Dominating Set Approximation in Multi-View Graphs , 2012, AAAI.

[5]  Yee Whye Teh,et al.  Hierarchical Dirichlet Trees for Information Retrieval , 2009, HLT-NAACL.

[6]  James Allan,et al.  Temporal summaries of new topics , 2001, SIGIR '01.

[7]  Philip S. Yu,et al.  Time-dependent event hierarchy construction , 2007, KDD '07.

[8]  Jianwen Zhang,et al.  Evolutionary hierarchical dirichlet processes for multiple correlated time-varying corpora , 2010, KDD.

[9]  Tao Li,et al.  PatentLine: analyzing technology evolution on multi-view patent graphs , 2014, SIGIR.

[10]  Michael I. Jordan,et al.  Hierarchical Dirichlet Processes , 2006 .

[11]  James Allan,et al.  Finding and linking incidents in news , 2007, CIKM '07.

[12]  Andrew McCallum,et al.  Topics over time: a non-Markov continuous-time model of topical trends , 2006, KDD '06.

[13]  Yong Yu,et al.  Enhancing diversity, coverage and balance for summarization through structure learning , 2009, WWW '09.

[14]  Dragomir R. Radev,et al.  Centroid-based summarization of multiple documents , 2004, Inf. Process. Manag..

[15]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[16]  Jade Goldstein-Stewart,et al.  Summarizing text documents: sentence selection and evaluation metrics , 1999, SIGIR '99.

[17]  Yan Zhang,et al.  Evolutionary timeline summarization: a balanced optimization framework via iterative substitution , 2011, SIGIR.

[18]  Eduard H. Hovy,et al.  Weakly Supervised User Profile Extraction from Twitter , 2014, ACL.

[19]  Padhraic Smyth,et al.  Modeling General and Specific Aspects of Documents with a Probabilistic Topic Model , 2006, NIPS.

[20]  Jiwei Li,et al.  Evolutionary Hierarchical Dirichlet Process for Timeline Summarization , 2013, ACL.

[21]  Arnaud Doucet,et al.  Generalized Polya Urn for Time-varying Dirichlet Process Mixtures , 2007, UAI.

[22]  Kuo Zhang,et al.  New event detection based on indexing-tree and named entity , 2007, SIGIR.

[23]  Claire Cardie,et al.  Major Life Event Extraction from Twitter based on Congratulations/Condolences Speech Acts , 2014, EMNLP.

[24]  John D. Lafferty,et al.  Dynamic topic models , 2006, ICML.

[25]  David B. Dunson,et al.  The dynamic hierarchical Dirichlet process , 2008, ICML '08.

[26]  S. Y. Dennis On the hyper-Dirichlet type 1 and hyper-Liouville distributions , 1991 .

[27]  Hai Leong Chieu,et al.  Query based event extraction along a timeline , 2004, SIGIR '04.

[28]  Claire Cardie,et al.  Timeline generation: tracking individuals on twitter , 2013, WWW.

[29]  Thomas L. Griffiths,et al.  Hierarchical Topic Models and the Nested Chinese Restaurant Process , 2003, NIPS.

[30]  Xiaojin Zhu,et al.  Incorporating domain knowledge into topic modeling via Dirichlet Forest priors , 2009, ICML '09.

[31]  James Allan,et al.  Automatic generation of overview timelines , 2000, SIGIR '00.

[32]  Xiaojun Wan,et al.  Multi-document summarization using cluster-based link analysis , 2008, SIGIR '08.