Inductive Transfer for Text Classification using Generalized Reliability Indicators

Machine-learning researchers face the omnipresent challenge of developing predictive models that converge rapidly in accuracy with increases in the quantity of scarce labeled training data. We introduce Layered AbstractionBased Ensemble Learning (LABEL), a method that shows promise in improving generalization performance by exploiting additional labeled data drawn from related discrimination tasks within a corpus and from other corpora. LABEL rst maps the original feature space, targeted at predicting membership in a specic topic, to a new feature space aimed at modeling the reliability of an ensemble of text classiers. The resulting abstracted representation is invariant across each of the binary discrimination tasks, allowing the data to be pooled. We then construct a context-sensitive combination rule for each task using the pooled data. Thus, we are able to more accurately model domain structure which would not have been possible using only the limited labeled data from each task separately. Using several corpora for an empirical evaluation of topic classication accuracy of text documents, we demonstrate that LABEL can increase the generalization performance across a set of related tasks.

[1]  James P. Callan,et al.  Training algorithms for linear text classifiers , 1996, SIGIR '96.

[2]  G. Broll,et al.  Microsoft Corporation , 1999 .

[3]  Thorsten Joachims,et al.  Text Categorization with Support Vector Machines: Learning with Many Relevant Features , 1998, ECML.

[4]  Susan T. Dumais,et al.  The Combination of Text Classifiers Using Reliability Indicators , 2016, Information Retrieval.

[5]  William A. Gale,et al.  A sequential algorithm for training text classifiers , 1994, SIGIR '94.

[6]  Daniel Kudenko,et al.  Transferring and Retraining Learned Information Filters , 1997, AAAI/IAAI.

[7]  David Maxwell Chickering,et al.  A Bayesian Approach to Learning Bayesian Networks with Local Structure , 1997, UAI.

[8]  Andrew McCallum,et al.  A comparison of event models for naive bayes text classification , 1998, AAAI 1998.

[9]  John C. Platt,et al.  Fast training of support vector machines using sequential minimal optimization, advances in kernel methods , 1999 .

[10]  J. Friedman,et al.  Predicting Multivariate Responses in Multiple Linear Regression , 1997 .

[11]  John Platt,et al.  Probabilistic Outputs for Support vector Machines and Comparisons to Regularized Likelihood Methods , 1999 .

[12]  David D. Lewis,et al.  A sequential algorithm for training text classifiers: corrigendum and additional data , 1995, SIGF.

[13]  OpitzDavid,et al.  Popular ensemble methods , 1999 .

[14]  Yiming Yang,et al.  A re-examination of text categorization methods , 1999, SIGIR '99.

[15]  João Gama,et al.  Combining Classifiers by Constructive Induction , 1998, ECML.

[16]  Sebastian Thrun,et al.  Discovering Structure in Multiple Learning Tasks: The TC Algorithm , 1996, ICML.

[17]  Susan T. Dumais,et al.  Probabilistic combination of text classifiers using reliability indicators: models and results , 2002, SIGIR '02.

[18]  Alexander Dekhtyar,et al.  Information Retrieval , 2018, Lecture Notes in Computer Science.

[19]  Joydeep Ghosh,et al.  A Supra-Classifier Architecture for Scalable Knowledge Reuse , 1998, ICML.

[20]  David Maxwell Chickering,et al.  Dependency Networks for Inference, Collaborative Filtering, and Data Visualization , 2000, J. Mach. Learn. Res..

[21]  Eric Horvitz,et al.  Bayesian Modality Fusion: Probabilistic Integration of Multiple Vision Algorithms for Head Tracking , 1999 .

[22]  Susan T. Dumais,et al.  Inductive learning algorithms and representations for text categorization , 1998, CIKM '98.

[23]  Rich Caruana,et al.  Multitask Learning , 1998, Encyclopedia of Machine Learning and Data Mining.

[24]  Susan T. Dumais,et al.  Hierarchical classification of Web content , 2000, SIGIR '00.

[25]  Ian H. Witten,et al.  Issues in Stacked Generalization , 2011, J. Artif. Intell. Res..