Prototype/topic based clustering method for weblogs

The work of the third author was carried out in the framework of the WIQ-EI IRSES project (Grant No. 269180) within the FP7 Marie Curie, the DIANA APPLICATIONS Finding Hidden Knowledge in Texts: Applications (TIN2012-38603-C02-01) project and the VLC/CAMPUS Microcluster on Multimodal Interaction in Intelligent Systems.

[1]  George Kingsley Zipf,et al.  Human Behaviour and the Principle of Least Effort: an Introduction to Human Ecology , 2012 .

[2]  James Allan,et al.  On-Line New Event Detection and Tracking , 1998, SIGIR Forum.

[3]  Geoffrey I. Webb,et al.  On the effect of data set size on bias and variance in classification learning , 1999 .

[4]  Richard M. Karp,et al.  Efficient Randomized Pattern-Matching Algorithms , 1987, IBM J. Res. Dev..

[5]  Mark Steyvers,et al.  Finding scientific topics , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[6]  Karen Sparck Jones A statistical interpretation of term specificity and its application in retrieval , 1972 .

[7]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[8]  Jing Peng,et al.  A Clustering Algorithm for Short Documents Based On Concept Similarity , 2007, 2007 IEEE Pacific Rim Conference on Communications, Computers and Signal Processing.

[9]  Gerard Salton,et al.  A vector space model for automatic indexing , 1975, CACM.

[10]  Karen Spärck Jones A statistical interpretation of term specificity and its application in retrieval , 2021, J. Documentation.

[11]  Lauren Elkin,et al.  A Blogger ’ s Blog : Exploring the Definition of a Medium , 2006 .

[12]  Paolo Rosso,et al.  A Self-enriching Methodology for Clustering Narrow Domain Short Texts , 2011, Comput. J..

[13]  Zellig S. Harris,et al.  Distributional Structure , 1954 .

[14]  Gregory Grefenstette,et al.  Explorations in automatic thesaurus discovery , 1994 .

[15]  Michael E. Lesk,et al.  Automatic sense disambiguation using machine readable dictionaries: how to tell a pine cone from an ice cream cone , 1986, SIGDOC '86.

[16]  Christian Wartena,et al.  Topic Detection by Clustering Keywords , 2008, 2008 19th International Workshop on Database and Expert Systems Applications.

[17]  Hinrich Schütze,et al.  Automatic Word Sense Discrimination , 1998, Comput. Linguistics.

[18]  Charu C. Aggarwal,et al.  A Survey of Text Classification Algorithms , 2012, Mining Text Data.

[19]  Julio Gonzalo,et al.  Towards an evaluation framework for topic extraction systems for online reputation management , 2010 .

[20]  Ted Pedersen,et al.  An Adapted Lesk Algorithm for Word Sense Disambiguation Using WordNet , 2002, CICLing.

[21]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[22]  Ted Pedersen,et al.  Word Sense Discrimination by Clustering Contexts in Vector and Similarity Spaces , 2004, CoNLL.

[23]  Mark A. Girolami,et al.  Employing Latent Dirichlet Allocation for fraud detection in telecommunications , 2007, Pattern Recognit. Lett..

[24]  Yorick Wilks,et al.  Providing machine tractable dictionary tools , 1990, Machine Translation.

[25]  Yee Whye Teh,et al.  NUS-ML: Improving Word Sense Disambiguation Using Topic Features , 2007, SemEval@ACL.

[26]  John Dunnion,et al.  Topic Detection in the news domain , 2004, ISICT.

[27]  Yiming Yang,et al.  Topic Detection and Tracking Pilot Study Final Report , 1998 .

[28]  Beibei Li,et al.  Enhancing clustering blog documents by utilizing author/reader comments , 2007, ACM-SE 45.

[29]  Charu C. Aggarwal,et al.  A Survey of Text Clustering Algorithms , 2012, Mining Text Data.

[30]  Bei Yu,et al.  A cross-collection mixture model for comparative text mining , 2004, KDD.

[31]  Xiaodong Wang,et al.  A Method of Hot Topic Detection in Blogs Using N-gram Model , 2013, J. Softw..

[32]  Christiane Fellbaum,et al.  Book Reviews: WordNet: An Electronic Lexical Database , 1999, CL.

[33]  Steffen Staab,et al.  Ontologies improve text document clustering , 2003, Third IEEE International Conference on Data Mining.

[34]  Thomas Hofmann,et al.  Probabilistic Latent Semantic Indexing , 1999, SIGIR Forum.

[35]  Thomas L. Griffiths,et al.  A probabilistic approach to semantic representation , 2019, Proceedings of the Twenty-Fourth Annual Conference of the Cognitive Science Society.

[36]  Paolo Rosso,et al.  Clustering Weblogs on the Basis of a Topic Detection Method , 2010, MCPR.

[37]  Paolo Rosso,et al.  Clustering Abstracts of Scientific Texts Using the Transition Point Technique , 2006, CICLing.

[38]  Shankara B. Subramanya,et al.  Clustering Blogs with Collective Wisdom , 2008, 2008 Eighth International Conference on Web Engineering.

[39]  Hans-Peter Frei,et al.  Concept based query expansion , 1993, SIGIR.

[40]  Ari Rappoport,et al.  Efficient Clustering of Short Messages into General Domains , 2013, ICWSM.

[41]  Haixun Wang,et al.  Short Text Conceptualization Using a Probabilistic Knowledgebase , 2011, IJCAI.

[42]  Paolo Rosso,et al.  Characterizing Weblog Corpora , 2009, NLDB.

[43]  George Karypis,et al.  A Comparison of Document Clustering Techniques , 2000 .

[44]  Pietro Perona,et al.  A Bayesian hierarchical model for learning natural scene categories , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[45]  D. Oard,et al.  Wikipedia-based topic clustering for microblogs , 2011, ASIST.

[46]  Lise Getoor,et al.  A Latent Dirichlet Model for Unsupervised Entity Resolution , 2005, SDM.

[47]  Yuichiro Sekiguchi,et al.  Topic Detection from Blog Documents Using Users’ Interests , 2006, 7th International Conference on Mobile Data Management (MDM'06).

[48]  Ravi kumar,et al.  Legal Documents Clustering using Latent Dirichlet Allocation , 2012 .

[49]  Vikram Pudi,et al.  A frequent keyword-set based algorithm for topic modeling and clustering of research papers , 2011, 2011 3rd Conference on Data Mining and Optimization (DMO).