Topic Detection over Online Forum

Topic detection is an hot research in the area of information retrieval. However, the new environment of Internet, the content of which are usually user-generated, asks for new requirements and brings new challenges. Topic detection has to resolve the problem of its lower quality and large amount of noisy. This paper not only provides a solution for detecting hot topics, but also giving its semantic descriptions as result. Our method integrates two kinds of term features (local features and global features), and use single pass clustering to perform topic detection in a web forum. It's efficient to filter non-topic documents and get readable descriptions of topic in our system. By comparison with baseline and topic model LDA, our method gets better performance and readable result.

[1]  Mor Naaman,et al.  Towards automatic extraction of event and place semantics from flickr tags , 2007, SIGIR.

[2]  Jaana Kekäläinen,et al.  IR evaluation methods for retrieving highly relevant documents , 2000, SIGIR '00.

[3]  Ee-Peng Lim,et al.  Analyzing feature trajectories for event detection , 2007, SIGIR.

[4]  Dimitrios Gunopulos,et al.  On burstiness-aware search for document sequences , 2009, KDD.

[5]  Tian Zhang,et al.  BIRCH: an efficient data clustering method for very large databases , 1996, SIGMOD '96.

[6]  Helena Ahonen-Myka,et al.  Simple Semantics in Topic Detection and Tracking , 2004, Information Retrieval.

[7]  James Allan,et al.  Topic detection and tracking: event-based information organization , 2002 .

[8]  Stelios C. A. Thomopoulos,et al.  Dignet: an unsupervised-learning clustering algorithm for clustering and data fusion , 1995 .

[9]  Marc Teboulle,et al.  Grouping Multidimensional Data - Recent Advances in Clustering , 2006 .

[10]  Aoying Zhou,et al.  AUCWeb: A Prototype for Analyzing User-Created Web Data , 2011, DASFAA.

[11]  Kuo Zhang,et al.  New event detection based on indexing-tree and named entity , 2007, SIGIR.

[12]  Aoying Zhou,et al.  Semantic Entity Detection by Integrating CRF and SVM , 2010, WAIM.

[13]  Pavel Berkhin,et al.  A Survey of Clustering Data Mining Techniques , 2006, Grouping Multidimensional Data.

[14]  Doug Downey,et al.  Unsupervised named-entity extraction from the Web: An experimental study , 2005, Artif. Intell..

[15]  Gerard Salton,et al.  Research and Development in Information Retrieval , 1982, Lecture Notes in Computer Science.

[16]  Hila Becker,et al.  Learning similarity metrics for event identification in social media , 2010, WSDM '10.

[17]  Gilad Mishne,et al.  Finding high-quality content in social media , 2008, WSDM '08.

[18]  Qi He,et al.  Bursty Feature Representation for Clustering Text Streams , 2007, SDM.

[19]  ChengXiang Zhai,et al.  Discovering evolutionary theme patterns from text: an exploration of temporal text mining , 2005, KDD '05.

[20]  Robert L. Grossman,et al.  Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining , 2005, KDD 2005.

[21]  Laks V. S. Lakshmanan,et al.  Efficient network aware search in collaborative tagging sites , 2008, Proc. VLDB Endow..