Beyond hybrid generative discriminative learning: spherical data classification

The blending of generative and discriminative approaches has been prevailed by exploring and adopting distinct characteristic of each approach toward constructing a complementar system combining the best of both. The majority of current research in classification and categorization does not completely address the true structure and nature of data for particular application at hand. In contrast to most previous research, our proposed work focuses on the modeling and classification of spherical data that are naturally generated in many data mining and knowledge discovery applications such as text classification, visual scenes categorization and gene expression analysis. This paper investigates a generative mixture model to cluster spherical data based on Langevin distribution. In particular, we formulate a unified probabilistic framework, where we build probabilistic kernels based on Fisher score and information divergences from mixture of Langevin distributions for Support Vector Machine. We demonstrate the effectiveness and the merits of the proposed learning framework through synthetic data and challenging applications involving spam filtering using both textual and visual email contents.

[1]  Benjamin Piwowarski,et al.  Precision recall with user modeling (PRUM): Application to structured information retrieval , 2007, TOIS.

[2]  Peter Grünwald,et al.  Invited review of the book Statistical and Inductive Inference by Minimum Message Length , 2006 .

[3]  Inderjit S. Dhillon,et al.  Clustering on the Unit Hypersphere using von Mises-Fisher Distributions , 2005, J. Mach. Learn. Res..

[4]  Lionel Prevost,et al.  Hybrid generative/discriminative classifier for unconstrained character recognition , 2005, Pattern Recognit. Lett..

[5]  Nizar Bouguila,et al.  Hybrid Generative/Discriminative Approaches for Proportional Data Modeling and Classification , 2012, IEEE Transactions on Knowledge and Data Engineering.

[6]  Katsuyuki Yamazaki,et al.  Density-based spam detector , 2004, IEICE Trans. Inf. Syst..

[7]  K. V. Mardia,et al.  Algorithm AS 81: Circular Statistics , 1975 .

[8]  Nicholas I. Fisher,et al.  Statistical Analysis of Spherical Data. , 1987 .

[9]  Susan T. Dumais,et al.  A Bayesian Approach to Filtering Junk E-Mail , 1998, AAAI 1998.

[10]  Jörg Kindermann,et al.  Text Categorization with Support Vector Machines. How to Represent Texts in Input Space? , 2002, Machine Learning.

[11]  Ioannis Andreadis,et al.  A Center-Surround Histogram for content-based image retrieval , 2011, Pattern Analysis and Applications.

[12]  Antoni B. Chan,et al.  A Family of Probabilistic Kernels Based on Information Divergence , 2004 .

[13]  Gary Ulrich,et al.  Computer Generation of Distributions on the M‐Sphere , 1984 .

[14]  Xindong Wu,et al.  A structurally motivated framework for discriminant analysis , 2011, Pattern Analysis and Applications.

[15]  K. Mardia Statistics of Directional Data , 1972 .

[16]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[17]  Thomas S. Huang,et al.  Generative model-based speaker clustering via mixture of von Mises-Fisher distributions , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[18]  Christopher M. Bishop,et al.  Pattern Recognition and Machine Learning (Information Science and Statistics) , 2006 .

[19]  Matthijs C. Dorst Distinctive Image Features from Scale-Invariant Keypoints , 2011 .

[20]  Solomon Kullback,et al.  Information Theory and Statistics , 1960 .

[21]  Nuno Vasconcelos,et al.  A Kullback-Leibler Divergence Based Kernel for SVM Classification in Multimedia Applications , 2003, NIPS.

[22]  Ming-Wei Chang,et al.  Partitioned logistic regression for spam filtering , 2008, KDD.

[23]  Fabio Roli,et al.  Spam Filtering Based On The Analysis Of Text Information Embedded Into Images , 2006, J. Mach. Learn. Res..

[24]  Tony Jebara,et al.  Probability Product Kernels , 2004, J. Mach. Learn. Res..

[25]  Inderjit S. Dhillon,et al.  Concept Decompositions for Large Sparse Text Data Using Clustering , 2004, Machine Learning.

[26]  Alexander J. Smola,et al.  Classification in a normalized feature space using support vector machines , 2003, IEEE Trans. Neural Networks.

[27]  S. Kullback,et al.  Information Theory and Statistics , 1959 .

[28]  K. Mardia Distribution Theory for the Von Mises-Fisher Distribution and Its Application , 1975 .

[29]  G. S. Watson Statistics on Spheres , 1983 .

[30]  K. V. Mardia,et al.  Algorithm AS 80: Spherical Statistics , 1975 .

[31]  Adam Roman,et al.  CORES: fusion of supervised and unsupervised training methods for a multi-class classification problem , 2011, Pattern Analysis and Applications.

[32]  Baba C. Vemuri,et al.  von Mises-Fisher mixture model of the diffusion ODF , 2006, 3rd IEEE International Symposium on Biomedical Imaging: Nano to Macro, 2006..

[33]  G. Schwarz Estimating the Dimension of a Model , 1978 .

[34]  C. S. Wallace,et al.  Statistical and Inductive Inference by Minimum Message Length (Information Science and Statistics) , 2005 .

[35]  Wolfgang Nejdl,et al.  MailRank: using ranking for spam detection , 2005, CIKM '05.

[36]  Inderjit S. Dhillon,et al.  Efficient Clustering of Very Large Document Collections , 2001 .

[37]  Nasser M. Nasrabadi,et al.  Pattern Recognition and Machine Learning , 2006, Technometrics.

[38]  Nizar Bouguila,et al.  High-Dimensional Unsupervised Selection and Estimation of a Finite Generalized Dirichlet Mixture Model Based on Minimum Message Length , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[39]  Mark Dredze,et al.  Learning Fast Classifiers for Image Spam , 2007, CEAS.

[40]  Salvatore J. Stolfo,et al.  Combining email models for false positive reduction , 2005, KDD '05.

[41]  Kanti V. Mardia,et al.  Statistics of Directional Data , 1972 .

[42]  Bhaskar Mehta,et al.  Detecting image spam using visual features and near duplicate detection , 2008, WWW.

[43]  David L. Dowe,et al.  MML Estimation of the Parameters of the Sherical Fisher Distribution , 1996, ALT.

[44]  Shyhtsun Felix Wu,et al.  On Attacking Statistical Spam Filters , 2004, CEAS.

[45]  J. Rissanen,et al.  Modeling By Shortest Data Description* , 1978, Autom..

[46]  Enrico Blanzieri,et al.  A survey of learning-based techniques of email spam filtering , 2008, Artificial Intelligence Review.

[47]  H. Akaike A new look at the statistical model identification , 1974 .

[48]  David L. Dowe,et al.  MML clustering of multi-state, Poisson, von Mises circular and Gaussian distributions , 2000, Stat. Comput..

[49]  Nizar Bouguila,et al.  Unsupervised selection of a finite Dirichlet mixture model: an MML-based approach , 2006, IEEE Transactions on Knowledge and Data Engineering.

[50]  Ming-Syan Chen,et al.  ProMail: Using Progressive Email Social Network for Spam Detection , 2007, PAKDD.

[51]  Geoffrey J. McLachlan,et al.  Finite Mixture Models , 2019, Annual Review of Statistics and Its Application.

[52]  Thorsten Joachims,et al.  Text categorization with support vector machines , 1999 .

[53]  T. Kailath The Divergence and Bhattacharyya Distance Measures in Signal Selection , 1967 .

[54]  Andrew Zisserman,et al.  Scene Classification Using a Hybrid Generative/Discriminative Approach , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[55]  A. Wood Simulation of the von mises fisher distribution , 1994 .

[56]  John R. Hershey,et al.  Approximating the Kullback Leibler Divergence Between Gaussian Mixture Models , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[57]  Y. Zhu,et al.  A Local-Concentration-Based Feature Extraction Approach for Spam Filtering , 2011, IEEE Transactions on Information Forensics and Security.

[58]  Nizar Bouguila,et al.  A study of spam filtering using support vector machines , 2010, Artificial Intelligence Review.

[59]  Rajat Raina,et al.  Classification with Hybrid Generative/Discriminative Models , 2003, NIPS.

[60]  Gordon V. Cormack,et al.  Online supervised spam filter evaluation , 2007, TOIS.

[61]  Thorsten Joachims,et al.  Text Categorization with Support Vector Machines: Learning with Many Relevant Features , 1998, ECML.

[62]  Chih-Hung Wu,et al.  Behavior-based spam detection using a hybrid method of rule-based techniques and neural networks , 2009, Expert Syst. Appl..

[63]  Nicholas I. Fisher,et al.  Statistical Analysis of Spherical Data. , 1987 .

[64]  Constantine D. Spyropoulos,et al.  An experimental comparison of naive Bayesian and keyword-based anti-spam filtering with personal e-mail messages , 2000, SIGIR '00.

[65]  A. Rényi On Measures of Entropy and Information , 1961 .

[66]  Michael I. Jordan,et al.  On Discriminative vs. Generative Classifiers: A comparison of logistic regression and naive Bayes , 2001, NIPS.

[67]  Hal Daumé,et al.  Generative Kernels for Exponential Families , 2011, AISTATS.

[68]  Ian T. Jolliffe,et al.  Fitting mixtures of von Mises distributions: a case study involving sudden infant death syndrome , 2003, Comput. Stat. Data Anal..

[69]  Levent Özgür,et al.  Optimization of dependency and pruning usage in text classification , 2010, Pattern Analysis and Applications.

[70]  David Haussler,et al.  Exploiting Generative Models in Discriminative Classifiers , 1998, NIPS.

[71]  Jianhua Lin,et al.  Divergence measures based on the Shannon entropy , 1991, IEEE Trans. Inf. Theory.

[72]  Thore Graepel,et al.  A PAC-Bayesian Margin Bound for Linear Classifiers: Why SVMs work , 2000, NIPS.

[73]  Gordon V. Cormack,et al.  TREC 2006 Spam Track Overview , 2006, TREC.