Hierarchical Multi-Label Classification Using Web Reasoning for Large Datasets

Extracting valuable data among large volumes of data is one of the main challenges in Big Data. In this paper, a Hierarchical Multi-Label Classification process called Semantic HMC is presented. This process aims to extract valuable data from very large data sources, by automatically learning a label hierarchy and classifying data items.The Semantic HMC process is composed of five scalable steps, namely Indexation, Vectorization, Hierarchization, Resolution and Realization. The first three steps construct automatically a label hierarchy from statistical analysis of data. This paper focuses on the last two steps which perform item classification according to the label hierarchy. The process is implemented as a scalable and distributed application, and deployed on a Big Data platform. A quality evaluation is described, which compares the approach with multi-label classification algorithms from the state of the art dedicated to the same goal. The Semantic HMC approach outperforms state of the art approaches in some areas.

[1]  Krzysztof Janowicz,et al.  Linked Data, Big Data, and the 4th Paradigm , 2013, Semantic Web.

[2]  Volker Haarslev,et al.  Tableau-Based Reasoning , 2009, Handbook on Ontologies.

[3]  Min-Ling Zhang,et al.  A Review on Multi-Label Learning Algorithms , 2014, IEEE Transactions on Knowledge and Data Engineering.

[4]  Herman J. ter Horst,et al.  Completeness, decidability and complexity of entailment for RDF Schema and a semantic extension involving the OWL vocabulary , 2005, J. Web Semant..

[5]  Saso Dzeroski,et al.  Decision trees for hierarchical multi-label classification , 2008, Machine Learning.

[6]  Ian Horrocks,et al.  FaCT++ Description Logic Reasoner: System Description , 2006, IJCAR.

[7]  Zoran Bosnic,et al.  Ontology-based multi-label classification of economic articles , 2011, Comput. Sci. Inf. Syst..

[8]  Saso Dzeroski,et al.  An extensive experimental comparison of methods for multi-label learning , 2012, Pattern Recognit..

[9]  Dieter Fensel,et al.  Ontologies: A silver bullet for knowledge management and electronic commerce , 2002 .

[10]  Frank van Harmelen,et al.  QueryPIE: Backward Reasoning for OWL Horst over Very Large Knowledge Bases , 2011, SEMWEB.

[11]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[12]  Christophe Cruz,et al.  Using DL-reasoner for hierarchical multilabel classification applied to economical e-news , 2014, 2014 Science and Information Conference.

[13]  Boris Motik,et al.  A Comparison of Reasoning Techniques for Querying Large Description Logic ABoxes , 2006, LPAR.

[14]  Geoff Holmes,et al.  Classifier chains for multi-label classification , 2009, Machine Learning.

[15]  Frank van Harmelen,et al.  OWL Reasoning with WebPIE: Calculating the Closure of 100 Billion Triples , 2010, ESWC.

[16]  Boris Motik,et al.  Hypertableau Reasoning for Description Logics , 2009, J. Artif. Intell. Res..

[17]  Christophe Cruz,et al.  Semantic HMC: A Predictive Model Using Multi-label Classification for Big Data , 2015, 2015 IEEE Trustcom/BigDataSE/ISPA.

[18]  Jun Fang,et al.  Documents classification by using ontology reasoning and similarity measure , 2010, 2010 Seventh International Conference on Fuzzy Systems and Knowledge Discovery.

[19]  Yarden Katz,et al.  Pellet: A practical OWL-DL reasoner , 2007, J. Web Semant..

[20]  Saso Dzeroski,et al.  Ensembles of Multi-Objective Decision Trees , 2007, ECML.

[21]  James T. Kwok,et al.  MultiLabel Classification on Tree- and DAG-Structured Hierarchies , 2011, ICML.

[22]  H. Lan,et al.  SWRL : A semantic Web rule language combining OWL and ruleML , 2004 .

[23]  Yee Whye Teh,et al.  On Smoothing and Inference for Topic Models , 2009, UAI.

[24]  André Carlos Ponce de Leon Ferreira de Carvalho,et al.  Hierarchical multi-label classification using local neural networks , 2014, J. Comput. Syst. Sci..

[25]  Yannis Papanikolaou,et al.  Improving Gibbs Sampling Predictions on Unseen Data for Latent Dirichlet Allocation , 2015 .

[26]  Mark Steyvers,et al.  Finding scientific topics , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[27]  Zakaria Elberrichi,et al.  Medical Documents Classification Based on the Domain Ontology MeSH , 2012, Int. Arab. J. e Technol..

[28]  Frank van Harmelen,et al.  WebPIE: A Web-scale Parallel Inference Engine using MapReduce , 2012, J. Web Semant..

[29]  M. Anusha,et al.  Big Data-Survey , 2016 .

[30]  Leo Obrst,et al.  Ontologies for semantically interoperable systems , 2003, CIKM '03.

[31]  Grigorios Tsoumakas,et al.  Multi-Label Classification: An Overview , 2007, Int. J. Data Warehous. Min..

[32]  Christophe Cruz,et al.  An unsupervised classification process for large datasets using web reasoning , 2016, SBD '16.

[33]  Sébastien Destercke,et al.  Making Ontology-Based Knowledge and Decision Trees Interact: An Approach to Enrich Knowledge and Increase Expert Confidence in Data-Driven Models , 2010, KSEM.

[34]  Tamar Domany,et al.  Enterprise Data Classification Using Semantic Web Technologies , 2010, SEMWEB.

[35]  Jacopo Urbani Three Laws Learned from Web-Scale Reasoning , 2013, AAAI Fall Symposia.

[36]  Jun Wei,et al.  A distributed rule execution mechanism based on MapReduce in sematic web reasoning , 2013, Internetware.

[37]  Zhi-Hua Zhou,et al.  ML-KNN: A lazy learning approach to multi-label learning , 2007, Pattern Recognit..

[38]  Ana Roxin,et al.  FOWLA, A Federated Architecture for Ontologies , 2015, RuleML.

[39]  Grigorios Tsoumakas,et al.  Effective and Efficient Multilabel Classification in Domains with Large Number of Labels , 2008 .

[40]  Concha Bielza,et al.  Multi-label classification with Bayesian network-based chain classifiers , 2014, Pattern Recognit. Lett..

[41]  Christophe Cruz,et al.  Semantic HMC for big data analysis , 2014, 2014 IEEE International Conference on Big Data (Big Data).