Leveraging Higher Order Dependencies Between Features for Text Classification

Traditional machine learning methods only consider relationships between feature values within individual data instances while disregarding the dependencies that link features across instances. In this work, we develop a general approach to supervised learning by leveraging higher-order dependencies between features. We introduce a novel Bayesian framework for classification named Higher Order Naive Bayes (HONB). Unlike approaches that assume data instances are independent, HONB leverages co-occurrence relations between feature values across different instances. Additionally, we generalize our framework by developing a novel data-driven space transformation that allows any classifier operating in vector spaces to take advantage of these higher-order co-occurrence relations. Results obtained on several benchmark text corpora demonstrate that higher-order approaches achieve significant improvements in classification accuracy over the baseline (first-order) methods.

[1]  Ulrich H.-G. Kreßel,et al.  Pairwise classification and support vector machines , 1999 .

[2]  Hinrich Schütze,et al.  Automatic Word Sense Discrimination , 1998, Comput. Linguistics.

[3]  Naftali Tishby,et al.  The Power of Word Clusters for Text Classification , 2006 .

[4]  William M. Pottenger,et al.  A framework for understanding Latent Semantic Indexing (LSI) performance , 2006, Inf. Process. Manag..

[5]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[6]  William M. Pottenger,et al.  Distributed higher order association rule mining using information extracted from textual data , 2005, SKDD.

[8]  Padma Raghavan,et al.  Level search schemes for information filtering and retrieval , 2001, Inf. Process. Manag..

[9]  Lise Getoor,et al.  Link-Based Classification , 2003, Encyclopedia of Machine Learning and Data Mining.

[10]  Ben Taskar,et al.  Probabilistic Classification and Clustering in Relational Data , 2001, IJCAI.

[11]  Ken Lang,et al.  NewsWeeder: Learning to Filter Netnews , 1995, ICML.

[12]  T. Landauer,et al.  Indexing by Latent Semantic Analysis , 1990 .

[13]  Jennifer Neville,et al.  Iterative Classification in Relational Data , 2000 .

[14]  Piotr Indyk,et al.  Enhanced hypertext categorization using hyperlinks , 1998, SIGMOD '98.

[15]  Lise Getoor,et al.  Link mining: a survey , 2005, SKDD.

[16]  Jennifer Neville,et al.  Dependency networks for relational data , 2004, Fourth IEEE International Conference on Data Mining (ICDM'04).

[17]  Thorsten Joachims,et al.  Learning to classify text using support vector machines - methods, theory and algorithms , 2002, The Kluwer international series in engineering and computer science.

[18]  A. Atiya,et al.  Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond , 2005, IEEE Transactions on Neural Networks.

[19]  W. Bruce Croft,et al.  Corpus-based stemming using cooccurrence of word variants , 1998, TOIS.

[20]  Philip Edmonds Choosing the word most typical in context using a lexical co-occurrence network , 1997 .

[21]  William M. Pottenger,et al.  Detection of Interdomain Routing Anomalies Based on Higher-Order Path Analysis , 2006, Sixth International Conference on Data Mining (ICDM'06).