Semantic Models for Style-Based Text Clustering

The paper addresses some roles of concept-based representations in document clustering to support knowledge discovery. Computational Intelligence algorithms can benefit from semantic networks in the definition of similarity between pairs of documents. After analyzing the tuning of semantic networks in a systematic fashion, the research defines and evaluates a novel semantic-based metrics, which integrates both classical and style-related features of texts. Experimental results confirm the effectiveness of the approach, showing that applying a refined semantic representation into a clustering engine yields consistent structures for information retrieval and knowledge acquisition.

[1]  H. Sebastian Seung,et al.  Learning the parts of objects by non-negative matrix factorization , 1999, Nature.

[2]  Boris Mirkin,et al.  Clustering For Data Mining: A Data Recovery Approach (Chapman & Hall/Crc Computer Science) , 2005 .

[3]  Teuvo Kohonen,et al.  The self-organizing map , 1990, Neurocomputing.

[4]  Gerard Salton,et al.  A vector space model for automatic indexing , 1975, CACM.

[5]  David P. Anderson,et al.  BOINC: a system for public-resource computing and storage , 2004, Fifth IEEE/ACM International Workshop on Grid Computing.

[6]  Andreas Hotho,et al.  A Brief Survey of Text Mining , 2005, LDV Forum.

[7]  Piek Vossen,et al.  EuroWordNet: A multilingual database with lexical semantic networks , 1998, Springer Netherlands.

[8]  Lorenzo Rosasco,et al.  Dimensionality reduction and generalization , 2007, ICML '07.

[9]  Paolo Gastaldo,et al.  SeaLab Advanced Information Retrieval , 2010, 2010 IEEE Fourth International Conference on Semantic Computing.

[10]  Chinmay Hegde,et al.  Random Projections for Manifold Learning , 2007, NIPS.

[11]  Judith Redi,et al.  Text Clustering for Digital Forensics Analysis , 2009, CISIS.

[12]  M. Ng,et al.  Ontology-based Distance Measure for Text Clustering , 2006 .

[13]  Christopher D. Manning,et al.  Introduction to Information Retrieval , 2010, J. Assoc. Inf. Sci. Technol..

[14]  Santosh S. Vempala,et al.  Latent semantic indexing: a probabilistic analysis , 1998, PODS '98.

[15]  Weiguo Fan,et al.  Tapping the power of text mining , 2006, CACM.

[16]  Tong Zhang,et al.  Text Mining: Predictive Methods for Analyzing Unstructured Information , 2004 .

[17]  George A. Miller,et al.  WordNet: A Lexical Database for English , 1995, HLT.

[18]  Paolo Gastaldo,et al.  K-Means Clustering for Content-Based Document Management in Intelligence , 2009 .

[19]  Steffen Staab,et al.  WordNet improves text document clustering , 2003, SIGIR 2003.

[20]  Stefan Wermter,et al.  Neural Network Based Document Clustering Using WordNet Ontologies , 2004, Int. J. Hybrid Intell. Syst..

[21]  Christian Boitet,et al.  Automated Translation at Grenoble University , 1985, Comput. Linguistics.

[22]  Dimitar Kazakov,et al.  WordNet-based text document clustering , 2004 .

[23]  Yiming Yang,et al.  RCV1: A New Benchmark Collection for Text Categorization Research , 2004, J. Mach. Learn. Res..

[24]  Heikki Mannila,et al.  Random projection in dimensionality reduction: applications to image and text data , 2001, KDD '01.

[25]  Steffen Staab,et al.  Ontology-based text clustering , 2001, IJCAI 2001.

[26]  Pushpak Bhattacharyya,et al.  Text Clustering using Semantics , 2002 .

[27]  Michael W. Berry,et al.  Survey of Text Mining II , 2008 .