Learning Non-Linear Functions for Text Classification

In this paper, we show that generative classifiers are capable of learning non-linear decision boundaries and that non-linear generative models can outperform a number of linear classifiers on some text categorization tasks. We first prove that 3-layer multinomial hierarchical generative (Bayesian) classifiers, under a particular independence assumption, can only learn the same linear decision boundaries as a multinomial naive Bayes classifier. We then go on to show that making a different independence assumption results in nonlinearization, thereby enabling us to learn non-linear decision boundaries. We finally evaluate the performance of these non-linear classifiers on a series of text classification tasks.

[1]  Geoffrey E. Hinton,et al.  A Learning Algorithm for Boltzmann Machines , 1985, Cogn. Sci..

[2]  David Madigan,et al.  On the Naive Bayes Model for Text Categorization , 2003, AISTATS.

[3]  Alexandr Andoni,et al.  Learning Polynomials with Neural Networks , 2014, ICML.

[4]  Yoshua. Bengio,et al.  Learning Deep Architectures for AI , 2007, Found. Trends Mach. Learn..

[5]  Jianchang Mao,et al.  Hierarchical Bayes for Text Classification , 2000, PRICAI Workshop on Text and Web Mining.

[6]  N. Laird,et al.  Maximum likelihood computations with repeated measures: application of the EM algorithm , 1987 .

[7]  Michael I. Jordan,et al.  On Discriminative vs. Generative Classifiers: A comparison of logistic regression and naive Bayes , 2001, NIPS.

[8]  Padmini Srinivasan,et al.  Hierarchical Text Categorization Using Neural Networks , 2004, Information Retrieval.

[9]  Paul Smolensky,et al.  Information processing in dynamical systems: foundations of harmony theory , 1986 .

[10]  Marvin Minsky,et al.  Perceptrons: An Introduction to Computational Geometry, Expanded Edition , 1987 .

[11]  Michael I. Jordan,et al.  An Introduction to Graphical Models , 2001 .

[12]  Ee-Peng Lim,et al.  Hierarchical text classification and evaluation , 2001, Proceedings 2001 IEEE International Conference on Data Mining.

[13]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[14]  Geoffrey E. Hinton,et al.  Using very deep autoencoders for content-based image retrieval , 2011, ESANN.

[15]  Christian Igel,et al.  An Introduction to Restricted Boltzmann Machines , 2012, CIARP.

[16]  Gustavo Carneiro,et al.  The use of deep learning features in a hierarchical classifier learned with the minimization of a non-greedy loss function that delays gratification , 2015, 2015 IEEE International Conference on Image Processing (ICIP).

[17]  Martha Palmer,et al.  Using semantic relations to improve information retrieval , 2005 .

[18]  Geoffrey E. Hinton,et al.  Learning internal representations by error propagation , 1986 .

[19]  Charles Elkan,et al.  Expectation Maximization Algorithm , 2010, Encyclopedia of Machine Learning.

[20]  Pasi Fränti,et al.  RSEM: An Accelerated Algorithm on Repeated EM , 2011, 2011 Sixth International Conference on Image and Graphics.

[21]  Michael I. Jordan,et al.  Graphical Models, Exponential Families, and Variational Inference , 2008, Found. Trends Mach. Learn..

[22]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[23]  Christopher Potts,et al.  Learning Word Vectors for Sentiment Analysis , 2011, ACL.

[24]  Dit-Yan Yeung,et al.  Towards Bayesian Deep Learning: A Survey , 2016, ArXiv.

[25]  Ana Margarida de Jesus,et al.  Improving Methods for Single-label Text Categorization , 2007 .

[26]  Diego Sona,et al.  Hierarchical Dirichlet model for document classification , 2005, ICML.

[27]  G. Lewicki,et al.  Approximation by Superpositions of a Sigmoidal Function , 2003 .

[28]  Kevin P. Murphy,et al.  An introduction to graphical models , 2011 .

[29]  Nir Friedman,et al.  Probabilistic Graphical Models - Principles and Techniques , 2009 .