Recursive Feature Generation for Knowledge-based Learning

When humans perform inductive learning, they often enhance the process with background knowledge. With the increasing availability of well-formed collaborative knowledge bases, the performance of learning algorithms could be significantly enhanced if a way were found to exploit these knowledge bases. In this work, we present a novel algorithm for injecting external knowledge into induction algorithms using feature generation. Given a feature, the algorithm defines a new learning task over its set of values, and uses the knowledge base to solve the constructed learning task. The resulting classifier is then used as a new feature for the original problem. We have applied our algorithm to the domain of text classification using large semantic knowledge bases. We have shown that the generated features significantly improve the performance of existing learning algorithms.

[1]  Stefan Kramer,et al.  Bottom-Up Propositionalization , 2000, ILP Work-in-progress reports.

[2]  Evgeniy Gabrilovich,et al.  Parameterized generation of labeled datasets for text categorization based on a hierarchical directory , 2004, SIGIR '04.

[3]  Zuzana Pelikánová,et al.  Google Knowledge Graph , 2014 .

[4]  Yordan Terziev Feature Generation using Ontologies during Induction of Decision Trees on Linked Data , 2016, DC@ISWC.

[5]  Roberto Basili,et al.  Complex Linguistic Features for Text Classification: A Comprehensive Study , 2004, ECIR.

[6]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[7]  Heiko Paulheim,et al.  Automated Feature Generation from Structured Knowledge , 2011 .

[8]  Yair Weiss,et al.  Learning object detection from a small number of examples: the importance of good features , 2004, CVPR 2004.

[9]  Gerhard Weikum,et al.  YAGO2: A Spatially and Temporally Enhanced Knowledge Base from Wikipedia: Extended Abstract , 2013, IJCAI.

[10]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[11]  Michael J. Shaw,et al.  Complex Concept Acquisition through Directed Search and Feature Caching , 1993, IJCAI.

[12]  Ian H. Witten,et al.  An open-source toolkit for mining Wikipedia , 2013, Artif. Intell..

[13]  Shaul Markovitch,et al.  Feature Generation Using General Constructor Functions , 2002, Machine Learning.

[14]  Carla E. Brodley,et al.  Multivariate decision trees , 2004, Machine Learning.

[15]  Ben Taskar,et al.  Feature Generation and Selection in Multi-Relational Statistical Learning , 2007 .

[16]  Markus Krötzsch,et al.  Wikidata , 2014, Commun. ACM.

[17]  Richard S. Sutton,et al.  Learning Polynomial Functions by Feature Construction , 1991, ML.

[18]  Thorsten Joachims,et al.  Text Categorization with Support Vector Machines: Learning with Many Relevant Features , 1998, ECML.

[19]  J. R. Quinlan Learning Logical Definitions from Relations , 1990 .

[20]  J. Ross Quinlan,et al.  Induction of Decision Trees , 1986, Machine Learning.

[21]  Hannah Bast,et al.  Easy access to the freebase dataset , 2014, WWW.

[22]  Danielle S. McNamara,et al.  Learning from texts: Effects of prior knowledge and text coherence , 1996 .

[23]  Evgeniy Gabrilovich,et al.  Wikipedia-based Semantic Interpretation for Natural Language Processing , 2014, J. Artif. Intell. Res..

[24]  M. Friedman The Use of Ranks to Avoid the Assumption of Normality Implicit in the Analysis of Variance , 1937 .

[25]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[26]  Johannes Fürnkranz,et al.  Unsupervised generation of data mining features from linked open data , 2012, WIMS '12.

[27]  Stephen Muggleton,et al.  Inductive Logic Programming , 2011, Lecture Notes in Computer Science.

[28]  Chris Buckley,et al.  OHSUMED: an interactive retrieval evaluation and new large test collection for research , 1994, SIGIR '94.

[29]  Geoffrey E. Hinton,et al.  Learning internal representations by error propagation , 1986 .

[30]  J. L. Hodges,et al.  Discriminatory Analysis - Nonparametric Discrimination: Consistency Properties , 1989 .

[31]  Shinichi Morishita,et al.  On Classification and Regression , 1998, Discovery Science.

[32]  Gerhard Weikum,et al.  Robust Disambiguation of Named Entities in Text , 2011, EMNLP.

[33]  Praveen Paritosh,et al.  Freebase: a collaboratively created graph database for structuring human knowledge , 2008, SIGMOD Conference.

[34]  Dawn Xiaodong Song,et al.  ExploreKit: Automatic Feature Generation and Selection , 2016, 2016 IEEE 16th International Conference on Data Mining (ICDM).

[35]  Haym Hirsh,et al.  Bootstrapping Training-Data Representations for Inductive Learning: A Case Study in Molecular Biology , 1994, AAAI.