Documents as multiple overlapping windows into grids of counts

In text analysis documents are often represented as disorganized bags of words; models of such count features are typically based on mixing a small number of topics [1,2]. Recently, it has been observed that for many text corpora documents evolve into one another in a smooth way, with some features dropping and new ones being introduced. The counting grid [3] models this spatial metaphor literally: it is a grid of word distributions learned in such a way that a document's own distribution of features can be modeled as the sum of the histograms found in a window into the grid. The major drawback of this method is that it is essentially a mixture and all the content must be generated by a single contiguous area on the grid. This may be problematic especially for lower dimensional grids. In this paper, we overcome this issue by introducing the Componential Counting Grid which brings the componential nature of topic models to the basic counting grid. We evaluated our approach on document classification and multimodal retrieval obtaining state of the art results on standard benchmarks.

[1]  Arindam Banerjee,et al.  Topic Models over Text Streams: A Study of Batch and Online Unsupervised Learning , 2007, SDM.

[2]  D. Dunson,et al.  Kernel stick-breaking processes. , 2008, Biometrika.

[3]  Alessandro Perina,et al.  Investigating Topic Models' Capabilities in Expression Microarray Data Classification , 2012, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[4]  Tony Jebara,et al.  Probability Product Kernels , 2004, J. Mach. Learn. Res..

[5]  Michael I. Jordan Learning in Graphical Models , 1999, NATO ASI Series.

[6]  Nebojsa Jojic,et al.  Image analysis by counting on a grid , 2011, CVPR 2011.

[7]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[8]  J. Tenenbaum,et al.  A global geometric framework for nonlinear dimensionality reduction. , 2000, Science.

[9]  Thomas Hofmann,et al.  Unsupervised Learning by Probabilistic Latent Semantic Analysis , 2004, Machine Learning.

[10]  B. Frey,et al.  Transformation-Invariant Clustering Using the EM Algorithm , 2003, IEEE Trans. Pattern Anal. Mach. Intell..

[11]  Nebojsa Jojic,et al.  Multidimensional counting grids: Inferring word order from disordered bags of words , 2011, UAI.

[12]  Geoffrey E. Hinton,et al.  A View of the Em Algorithm that Justifies Incremental, Sparse, and other Variants , 1998, Learning in Graphical Models.

[13]  John D. Lafferty,et al.  Correlated Topic Models , 2005, NIPS.

[14]  Bryan Silverthorn,et al.  Spherical Topic Models , 2010, ICML.

[15]  Michael I. Jordan,et al.  Modeling annotated data , 2003, SIGIR.

[16]  S T Roweis,et al.  Nonlinear dimensionality reduction by locally linear embedding. , 2000, Science.

[17]  Kristin A. Cook,et al.  Illuminating the Path: The Research and Development Agenda for Visual Analytics , 2005 .

[18]  Rajat Raina,et al.  Classification with Hybrid Generative/Discriminative Models , 2003, NIPS.

[19]  T. Minka Estimating a Dirichlet distribution , 2012 .

[20]  Trevor Darrell,et al.  Learning cross-modality similarity for multinomial data , 2011, 2011 International Conference on Computer Vision.

[21]  Gal Chechik,et al.  Euclidean Embedding of Co-occurrence Data , 2004, J. Mach. Learn. Res..

[22]  Yee Whye Teh,et al.  On Smoothing and Inference for Topic Models , 2009, UAI.

[23]  Andrew Zisserman,et al.  Scene Classification Using a Hybrid Generative/Discriminative Approach , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[24]  Nebojsa Jojic,et al.  Free Energy Score Spaces: Using Generative Information in Discriminative Classifiers , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.