Tackling the Class-Imbalance Learning Problem in Semantic Web Knowledge Bases

In the Semantic Web context, procedures for deciding the class-membership of an individual to a target concept in a knowledge base are generally based on automated reasoning. However, frequent cases of incompleteness/inconsistency due to distributed, heterogeneous nature and the Web-scale dimension of the knowledge bases. It has been shown that resorting to models induced from the data may offer comparably effective and efficient solutions for these cases, although skewness in the instance distribution may affect the quality of such models. This is known as class-imbalance problem. We propose a machine learning approach, based on the induction of Terminological Random Forests, that is an extension of the notion of Random Forest to cope with this problem in case of knowledge bases expressed through the standard Web ontology languages. Experimentally we show the feasibility of our approach and its effectiveness w.r.t. related methods, especially with imbalanced datasets.

[1]  Haibo He,et al.  Learning from Imbalanced Data , 2009, IEEE Transactions on Knowledge and Data Engineering.

[2]  Joshua Zhexue Huang,et al.  Scalable Random Forests for Massive Data , 2012, PAKDD.

[3]  Tom Heath,et al.  Linked Data: Evolving the Web into a Global Data Space , 2011, Linked Data.

[4]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[5]  Oscar Corcho,et al.  The Semantic Web: Semantics and Big Data , 2013, Lecture Notes in Computer Science.

[6]  Hendrik Blockeel,et al.  Top-Down Induction of First Order Logical Decision Trees , 1998, AI Commun..

[7]  Luc De Raedt,et al.  Top-down induction of logical decision trees , 1997 .

[8]  Nicola Fanizzi,et al.  Transductive Inference for Class-Membership Propagation in Web Ontologies , 2013, ESWC.

[9]  Peter A. Flach,et al.  Evaluation Measures for Multi-class Subgroup Discovery , 2009, ECML/PKDD.

[10]  Chun Yang,et al.  Learning to Diversify via Weighted Kernels for Classifier Ensemble , 2014, ArXiv.

[11]  Achim Rettinger,et al.  Statistical Relational Learning with Formal Ontologies , 2009, ECML/PKDD.

[12]  Diego Calvanese,et al.  The Description Logic Handbook , 2007 .

[13]  Guandong Xu,et al.  An Integrated Pruning Criterion for Ensemble Learning Based on Classification Accuracy and Diversity , 2012, KMO.

[14]  Achim Rettinger,et al.  Mining the Semantic Web , 2012, Data Mining and Knowledge Discovery.

[15]  Saso Dzeroski,et al.  First order random forests: Learning relational classifiers with complex aggregates , 2006, Machine Learning.

[16]  Yunqian Ma,et al.  Imbalanced Learning: Foundations, Algorithms, and Applications , 2013 .

[17]  Yoav Freund,et al.  Experiments with a New Boosting Algorithm , 1996, ICML.

[18]  Francisco Herrera,et al.  7th International Conference on Knowledge Management in Organizations: Service and Cloud Computing, KMO 2012, Salamanca, Spain, 11-13 July, 2012 , 2013, KMO.

[19]  Nicola Fanizzi,et al.  Inductive learning for the Semantic Web: What does it buy? , 2010, Semantic Web.

[20]  Nicola Fanizzi,et al.  Induction of Concepts in Web Ontologies through Terminological Decision Trees , 2010, ECML/PKDD.