Word sense learning based on feature selection and MDL principle

In this paper, we propose a word sense learning algorithm which is capable of unsupervised feature selection and cluster number identification. Feature selection for word sense learning is built on an entropy-based filter and formalized as a constraint optimization problem, the output of which is a set of important features. Cluster number identification is built on a Gaussian mixture model with a MDL-based criterion, and the optimal model order is inferred by minimizing the criterion. To evaluate closeness between the learned sense clusters with the ground-truth classes, we introduce a kind of weighted F-measure to model the effort needed to reconstruct the classes from the clusters. Experiments show that the algorithm can retrieve important features, roughly estimate the class numbers automatically and outperforms other algorithms in terms of the weighted F-measure. In addition, we also try to apply the algorithm to a specific task of adding new words into a Chinese thesaurus.

[1]  Hinrich Schütze,et al.  Automatic Word Sense Discrimination , 1998, Comput. Linguistics.

[2]  Yoshimi Suzuki,et al.  Word Sense Disambiguation in Untagged Text based on Term Weight Learning , 1999, EACL.

[3]  Dong-Hong Ji,et al.  Learning Word Sense With Feature Selection and Order Identification Capabilities , 2004, ACL.

[4]  Hinrich Schütze,et al.  Book Reviews: Foundations of Statistical Natural Language Processing , 1999, CL.

[5]  Patrick Pantel,et al.  Discovering word senses from text , 2002, KDD.

[6]  Ted Pedersen,et al.  SenseClusters: Unsupervised Clustering and Labeling of Similar Contexts , 2005, ACL.

[7]  Josef Kittler,et al.  Floating search methods in feature selection , 1994, Pattern Recognit. Lett..

[8]  Huan Liu,et al.  Feature selection for clustering - a filter solution , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..

[9]  Ted Pedersen,et al.  Selecting the “Right” Number of Senses Based on Clustering Criterion Functions , 2006, EACL.

[10]  Anil K. Jain,et al.  Feature Selection in Mixture-Based Clustering , 2002, NIPS.

[11]  Ji Donghong,et al.  Adding New Words into A Chinese Thesaurus , 1997 .

[12]  Mark Sanderson,et al.  Universities of Leeds, Sheffield and York http://eprints.whiterose.ac.uk/ , 2022 .

[13]  Michelle Q. Wang Baldonado,et al.  SONIA: a service for organizing networked information autonomously , 1998, DL '98.

[14]  Dominic Widdows,et al.  Using Curvature and Markov Clustering in Graphs for Lexical Acquisition and Word Sense Discrimination , 2004 .

[15]  Ted Pedersen,et al.  Word Sense Discrimination by Clustering Contexts in Vector and Similarity Spaces , 2004, CoNLL.

[16]  Carla E. Brodley,et al.  Feature Selection for Unsupervised Learning , 2004, J. Mach. Learn. Res..

[17]  Paul S. Bradley,et al.  Scaling Clustering Algorithms to Large Databases , 1998, KDD.

[18]  Shivakumar Vaithyanathan,et al.  Model Selection in Unsupervised Learning with Applications To Document Clustering , 1999, International Conference on Machine Learning.

[19]  Anil K. Jain,et al.  Unsupervised selection and estimation of finite mixture models , 2000, Proceedings 15th International Conference on Pattern Recognition. ICPR-2000.

[20]  Dominic Widdows,et al.  Discovering Corpus-Specific Word Senses , 2003, EACL.

[21]  Ted Pedersen,et al.  Distinguishing Word Senses in Untagged Text , 1997, EMNLP.

[22]  Patrick Pantel,et al.  Concept Discovery from Text , 2002, COLING.

[23]  J. Rissanen,et al.  Modeling By Shortest Data Description* , 1978, Autom..

[24]  W. Bruce Croft,et al.  Lexical ambiguity and information retrieval , 1992, TOIS.

[25]  W. Scott Spangler,et al.  Feature Weighting in k-Means Clustering , 2003, Machine Learning.

[26]  Huan Liu,et al.  Feature Selection for Clustering , 2000, Encyclopedia of Database Systems.

[27]  Joachim M. Buhmann,et al.  Stability-Based Model Selection , 2002, NIPS.

[28]  C. A. Murthy,et al.  Unsupervised Feature Selection Using Feature Similarity , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[29]  Luis Talavera,et al.  Feature Selection as a Preprocessing Step for Hierarchical Clustering , 1999, ICML.

[30]  Pat Langley,et al.  Editorial: On Machine Learning , 1986, Machine Learning.