Higher Order Naïve Bayes: A Novel Non-IID Approach to Text Classification

The underlying assumption in traditional machine learning algorithms is that instances are Independent and Identically Distributed (IID). These critical independence assumptions made in traditional machine learning algorithms prevent them from going beyond instance boundaries to exploit latent relations between features. In this paper, we develop a general approach to supervised learning by leveraging higher order dependencies between features. We introduce a novel Bayesian framework for classification termed Higher Order Naïve Bayes (HONB). Unlike approaches that assume data instances are independent, HONB leverages higher order relations between features across different instances. The approach is validated in the classification domain on widely used benchmark data sets. Results obtained on several benchmark text corpora demonstrate that higher order approaches achieve significant improvements in classification accuracy over the baseline methods, especially when training data is scarce. A complexity analysis also reveals that the space and time complexity of HONB compare favorably with existing approaches.

[1]  Frank Harary,et al.  Graph Theory , 2016 .

[2]  Andrew McCallum,et al.  Automating the Construction of Internet Portals with Machine Learning , 2000, Information Retrieval.

[3]  Haym Hirsh,et al.  Using LSI for text classification in the presence of background text , 2001, CIKM '01.

[5]  Jennifer Neville,et al.  Iterative Classification in Relational Data , 2000 .

[6]  Gerhard Weikum,et al.  Graph-based text classification: learn from your neighbors , 2006, SIGIR.

[7]  Sutanu Chakraborti,et al.  Supervised Latent Semantic Indexing Using Adaptive Sprinkling , 2007, IJCAI.

[8]  Andrew McCallum,et al.  A comparison of event models for naive bayes text classification , 1998, AAAI 1998.

[9]  Richard A. Harshman,et al.  Indexing by Latent Semantic Analysis , 1990, J. Am. Soc. Inf. Sci..

[10]  Wei-Ying Ma,et al.  Supervised latent semantic indexing for document categorization , 2004, Fourth IEEE International Conference on Data Mining (ICDM'04).

[11]  Pedro M. Domingos,et al.  Beyond Independence: Conditions for the Optimality of the Simple Bayesian Classifier , 1996, ICML.

[12]  Foster J. Provost,et al.  A Brief Survey of Machine Learning Methods for Classification in Networked Data and an Application to Suspicion Scoring , 2006, SNA@ICML.

[13]  Tom M. Mitchell,et al.  Learning to Extract Symbolic Knowledge from the World Wide Web , 1998, AAAI/IAAI.

[14]  D. Swanson Migraine and Magnesium: Eleven Neglected Connections , 2015, Perspectives in biology and medicine.

[15]  Ben Taskar,et al.  Discriminative Probabilistic Models for Relational Data , 2002, UAI.

[16]  Fridolin Wild An LSA Package for R , 2007 .

[17]  Geoffrey I. Webb,et al.  Not So Naive Bayes: Aggregating One-Dependence Estimators , 2005, Machine Learning.

[18]  Piotr Indyk,et al.  Enhanced hypertext categorization using hyperlinks , 1998, SIGMOD '98.

[19]  Wei-Ying Ma,et al.  Improving text classification using local latent semantic indexing , 2004, Fourth IEEE International Conference on Data Mining (ICDM'04).

[20]  Lise Getoor,et al.  Link mining: a survey , 2005, SKDD.

[21]  William M. Pottenger,et al.  A framework for understanding Latent Semantic Indexing (LSI) performance , 2006, Inf. Process. Manag..

[22]  Haym Hirsh,et al.  Improving Short Text Classification Using Unlabeled Background Knowledge , 2000, ICML 2000.

[23]  Richard M. Wilson,et al.  A course in combinatorics , 1992 .

[24]  William M. Pottenger,et al.  A Framework for Understanding LSI Performance , 2004 .

[25]  Gustaf Neumann,et al.  Parameters driving effectiveness of automated essay scoring with LSA , 2005 .

[26]  Jennifer Neville,et al.  Why collective inference improves relational classification , 2004, KDD.

[27]  Chaomei Chen,et al.  Mining the Web: Discovering knowledge from hypertext data , 2004, J. Assoc. Inf. Sci. Technol..

[28]  William M. Pottenger,et al.  A Software Infrastructure for Research in Textual Data Mining , 2004, Int. J. Artif. Intell. Tools.

[29]  Ben Taskar,et al.  Probabilistic Classification and Clustering in Relational Data , 2001, IJCAI.

[30]  Naftali Tishby,et al.  The Power of Word Clusters for Text Classification , 2006 .

[31]  Anoop Sarkar,et al.  Proceedings of the Twentieth International Conference on Machine Learning (ICML-2003) , 2003 .

[32]  Kurt Hornik,et al.  Support Vector Machines in R , 2006 .