Learning Classifiers from Distributional Data

Many big data applications give rise to distributional data wherein objects or individuals are naturally represented as K-tuples of bags of feature values where feature values in each bag are sampled from a feature and object specific distribution. We formulate and solve the problem of learning classifiers from distributional data. We consider three classes of methods for learning distributional classifiers: (i) those that rely on aggregation to encode distributional data into tuples of attribute values, i.e., instances that can be handled by traditional supervised machine learning algorithms, (ii) those that are based on generative models of distributional data, and (iii) the discriminative counterparts of the generative models considered in (ii) above. We compare the performance of the different algorithms on real-world as well as synthetic distributional data sets. The results of our experiments demonstrate that classifiers that take advantage of the information available in the distributional instance representation outperform or match the performance of those that fail to fully exploit such information.

[1]  William W. Cohen,et al.  Block-LDA: Jointly Modeling Entity-Annotated Text and Entity-Entity Links , 2014, Handbook of Mixed Membership Models and Their Applications.

[2]  Chih-Jen Lin,et al.  LIBLINEAR: A Library for Large Linear Classification , 2008, J. Mach. Learn. Res..

[3]  T. Ferguson A Bayesian Analysis of Some Nonparametric Problems , 1973 .

[4]  Foster J. Provost,et al.  Distribution-based aggregation for relational learning with identifier attributes , 2006, Machine Learning.

[5]  David M. Blei,et al.  Connections between the lines: augmenting social networks with text , 2009, KDD.

[6]  T. Minka Discriminative models, not discriminative training , 2005 .

[7]  Charles X. Ling,et al.  Using AUC and accuracy in evaluating learning algorithms , 2005, IEEE Transactions on Knowledge and Data Engineering.

[8]  Kevin P. Murphy,et al.  Machine learning - a probabilistic perspective , 2012, Adaptive computation and machine learning series.

[9]  David Haussler,et al.  Exploiting Generative Models in Discriminative Classifiers , 1998, NIPS.

[10]  Zhi-Hua Zhou Multi-Instance Learning : A Survey , 2004 .

[11]  David M. Blei,et al.  Probabilistic topic models , 2012, Commun. ACM.

[12]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[13]  David M. Blei,et al.  Supervised Topic Models , 2007, NIPS.

[14]  Vasant Honavar,et al.  Learning Relational Bayesian Classifiers from RDF Data , 2011, SEMWEB.

[15]  David Kauchak,et al.  Modeling word burstiness using the Dirichlet distribution , 2005, ICML.

[16]  Timothy W. Finin,et al.  SVMs for the Blogosphere: Blog Identification and Splog Detection , 2006, AAAI Spring Symposium: Computational Approaches to Analyzing Weblogs.

[17]  T. Minka Estimating a Dirichlet distribution , 2012 .

[18]  Tony Jebara,et al.  Probability Product Kernels , 2004, J. Mach. Learn. Res..

[19]  Bernhard Schölkopf,et al.  Learning from Distributions via Support Measure Machines , 2012, NIPS.

[20]  Andrew McCallum,et al.  A comparison of event models for naive bayes text classification , 1998, AAAI 1998.

[21]  Andrew McCallum,et al.  Introduction to Statistical Relational Learning , 2007 .

[22]  Guillaume Bouchard,et al.  The Tradeoff Between Generative and Discriminative Classifiers , 2004 .

[23]  Thomas G. Dietterich,et al.  Solving the Multiple Instance Problem with Axis-Parallel Rectangles , 1997, Artif. Intell..

[24]  Jennifer Neville,et al.  Simple estimators for relational Bayesian classifiers , 2003, Third IEEE International Conference on Data Mining.

[25]  William W. Cohen,et al.  Block-LDA: Jointly Modeling Entity-Annotated Text and Entity-Entity Links , 2014, Handbook of Mixed Membership Models and Their Applications.

[26]  Ramesh Nallapati,et al.  Joint latent topic models for text and citations , 2008, KDD.

[27]  Nasser M. Nasrabadi,et al.  Pattern Recognition and Machine Learning , 2006, Technometrics.

[28]  Pierre Tirilly,et al.  Language modeling for bag-of-visual words image categorization , 2008, CIVR '08.

[29]  Christopher M. Bishop,et al.  Pattern Recognition and Machine Learning (Information Science and Statistics) , 2006 .