A Fast Approximate AIB Algorithm for Distributional Word Clustering

Distributional word clustering merges the words having similar probability distributions to attain reliable parameter estimation, compact classification models and even better classification performance. Agglomerative Information Bottleneck (AIB) is one of the typical word clustering algorithms and has been applied to both traditional text classification and recent image recognition. Although enjoying theoretical elegance, AIB has one main issue on its computational efficiency, especially when clustering a large number of words. Different from existing solutions to this issue, we analyze the characteristics of its objective function-the loss of mutual information, and show that by merely using the ratio of word-class joint probabilities of each word, good candidate word pairs for merging can be easily identified. Based on this finding, we propose a fast approximate AIB algorithm and show that it can significantly improve the computational efficiency of AIB while well maintaining or even slightly increasing its classification performance. Experimental study on both text and image classification benchmark data sets shows that our algorithm can achieve more than 100 times speedup on large real data sets over the state-of-the-art method.

[1]  Ran El-Yaniv,et al.  On feature distributional clustering for text categorization , 2001, SIGIR '01.

[2]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[3]  Ken Lang,et al.  NewsWeeder: Learning to Filter Netnews , 1995, ICML.

[4]  Mubarak Shah,et al.  Scene Modeling Using Co-Clustering , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[5]  Robert L. Mercer,et al.  Class-Based n-gram Models of Natural Language , 1992, CL.

[6]  Lei Wang,et al.  A Fast Algorithm for Creating a Compact and Discriminative Visual Codebook , 2008, ECCV.

[7]  Cordelia Schmid,et al.  Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[8]  Mubarak Shah,et al.  Learning human actions via information maximization , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[9]  Antonio Criminisi,et al.  Object categorization by learned universal visual dictionary , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[10]  Naftali Tishby,et al.  Distributional Clustering of English Words , 1993, ACL.

[11]  Pietro Perona,et al.  Learning Generative Visual Models from Few Training Examples: An Incremental Bayesian Approach Tested on 101 Object Categories , 2004, 2004 Conference on Computer Vision and Pattern Recognition Workshop.

[12]  Lei Wang,et al.  In defense of soft-assignment coding , 2011, 2011 International Conference on Computer Vision.

[13]  Andrea Vedaldi,et al.  Vlfeat: an open and portable library of computer vision algorithms , 2010, ACM Multimedia.

[14]  Jianguo Zhang,et al.  The PASCAL Visual Object Classes Challenge , 2006 .

[15]  Svetlana Lazebnik,et al.  Supervised Learning of Quantizer Codebooks by Information Loss Minimization , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[16]  Andrew McCallum,et al.  Distributional clustering of words for text classification , 1998, SIGIR '98.

[17]  Stefano Soatto,et al.  Localizing Objects with Smart Dictionaries , 2008, ECCV.

[18]  Naftali Tishby,et al.  Agglomerative Information Bottleneck , 1999, NIPS.

[19]  Inderjit S. Dhillon,et al.  A Divisive Information-Theoretic Feature Clustering Algorithm for Text Classification , 2003, J. Mach. Learn. Res..