A Temporal Frequent Itemset-Based Clustering Approach For Discovering Event Episodes From News Sequence

When performing environmental scanning, organizations typically deal with a numerous of events and topics about their core business, relevant technique standards, competitors, and market, where each event or topic to monitor or track generally is associated with many news documents. To reduce information overload and information fatigues when monitoring or tracking such events, it is essential to develop an effective event episode discovery mechanism for organizing all news documents pertaining to an event of interest. In this study, we propose the time-adjoining frequent itemset-based event-episode discovery (TAFIED) technique. Based on the frequent itemset-based hierarchical clustering (FIHC) approach, our proposed TAFIED further considers the temporal characteristic of news articles, including the burst, novelty, and temporal proximity of features in an event episode, when discovering event episodes from the sequence of news articles pertaining to a specific event. Using the traditional feature-based HAC, HAC with a time-decaying function (HAC+TD), and FIHC techniques as performance benchmarks, our empirical evaluation results suggest that the proposed TAFIED technique outperforms all evaluation benchmarks in cluster recall and cluster precision.

[1]  Eric Brill,et al.  A Simple Rule-Based Part of Speech Tagger , 1992, HLT.

[2]  Martin Ester,et al.  Frequent term-based text clustering , 2002, KDD.

[3]  Jeonghee Yi,et al.  Detecting buzz from time-sequenced document streams , 2005, 2005 IEEE International Conference on e-Technology, e-Commerce and e-Service.

[4]  Ellen M. Voorhees,et al.  Implementing agglomerative hierarchic clustering algorithms for use in document retrieval , 1986, Inf. Process. Manag..

[5]  Yiming Yang,et al.  Topic-conditioned novelty detection , 2002, KDD.

[6]  James R. Lumpkin,et al.  Insights Between Environmental Scanning Activities and Porter's Generic Strategies: An Empirical Analysis , 1992 .

[7]  Juha Makkonen,et al.  Investigations on Event Evolution on TDT , 2003, NAACL.

[8]  James Allan,et al.  Taking Topic Detection From Evaluation to Practice , 2005, Proceedings of the 38th Annual Hawaii International Conference on System Sciences.

[9]  Chih-Ping Wei,et al.  Use of Text Summarization for Supporting Event Detection , 2004, PACIS.

[10]  Chih-Ping Wei,et al.  Accommodating Individual Preferences in the Categorization of Documents: A Personalized Clustering Approach , 2006, J. Manag. Inf. Syst..

[11]  Chih-Ping Wei,et al.  Discovering Event Evolution Patterns From Document Sequences , 2007, IEEE Transactions on Systems, Man, and Cybernetics - Part A: Systems and Humans.

[12]  Rey-Long Liu Collaborative Multiagent Adaptation for Business Environmental Scanning Through the Internet , 2004, Applied Intelligence.

[13]  Yiming Yang,et al.  Learning approaches for detecting and tracking news events , 1999, IEEE Intell. Syst..

[14]  Kuo Zhang,et al.  New event detection based on indexing-tree and named entity , 2007, SIGIR.

[15]  Yiming Yang,et al.  An example-based mapping method for text categorization and retrieval , 1994, TOIS.

[16]  Sharon Swee-Lin Tan,et al.  Environmental scanning on the Internet , 1998, ICIS '98.

[17]  Hsinchun Chen,et al.  Document clustering for electronic meetings: an experimental comparison of two techniques , 1999, Decis. Support Syst..

[18]  James Allan,et al.  Using Names and Topics for New Event Detection , 2005, HLT/EMNLP.

[19]  Chih-Ping Wei,et al.  Preserving User Preferences in Automated Document-Category Management: An Evolution-Based Approach , 2009, J. Manag. Inf. Syst..

[20]  Chih-Ping Wei,et al.  Event detection from online news documents for supporting environmental scanning , 2004, Decis. Support Syst..

[21]  R. Papka On-line New Event Detection, Clustering, and Tracking TITLE2: , 1999 .

[22]  Helena Ahonen-Myka,et al.  Simple Semantics in Topic Detection and Tracking , 2004, Information Retrieval.

[23]  Yiming Yang,et al.  Topic Detection and Tracking Pilot Study Final Report , 1998 .

[24]  J. Granat Event mining based on observations of the system , 2005 .

[25]  Ramesh Nallapati,et al.  Event threading within news topics , 2004, CIKM '04.

[26]  Benjamin C. M. Fung,et al.  Hierarchical Document Clustering using Frequent Itemsets , 2003, SDM.

[27]  James Allan,et al.  Text classification and named entities for new event detection , 2004, SIGIR '04.

[28]  Jaime Carbonell,et al.  Multi-Document Summarization By Sentence Extraction , 2000 .

[29]  Yiming Yang,et al.  A study of retrospective and on-line event detection , 1998, SIGIR '98.

[30]  James Allan,et al.  Topic detection and tracking: event-based information organization , 2002 .

[31]  James Allan,et al.  Explorations within topic tracking and detection , 2002 .