Open data categorization based on formal concept analysis

Government institutions have released a large number of datasets on their open data portals, which are in line with the data transparency and open government initiatives. With the purpose of making it more accessible and visible, these portals categorize datasets based on different criteria like publishers, categories, formats, and descriptions. However, some of this information is often missing, making it impossible to find datasets in all of these ways. As a result, with the number of datasets growing further on the portals, it is getting harder to obtain the desired information. This paper addresses this issue by introducing EODClassifier framework that suggests the best match for the category where a dataset should belong to. It relies on formal concept analysis as a means to generate a data structure that will reveal shared conceptualization originating from tags' usage and utilize it as a knowledge base to categorize uncategorized open datasets.

[1]  Valentina Janev,et al.  Lifting Open Data Portals to the Data Web , 2014, Linked Open Data.

[2]  G. Ritter,et al.  Lattice Theory , 2021, Introduction to Lattice Algebra.

[3]  Frederick Hayes-Roth,et al.  Building expert systems , 1983, Advanced book program.

[4]  Fabrizio Sebastiani,et al.  Machine learning in automated text categorization , 2001, CSUR.

[5]  Sérgio M. Dias,et al.  Concept lattices reduction: Definition, analysis and classification , 2015, Expert Syst. Appl..

[6]  Sylvain Kubler,et al.  Comparison of metadata quality in open data portals using the Analytic Hierarchy Process , 2017, Gov. Inf. Q..

[7]  Robert Jäschke,et al.  Formal concept analysis and tag recommendations in collaborative tagging systems , 2011, DISKI.

[8]  Rudolf Wille,et al.  Introduction to formal concept analysis , 1996 .

[9]  Rudolf Wille,et al.  Restructuring Lattice Theory: An Approach Based on Hierarchies of Concepts , 2009, ICFCA.

[10]  Jinguang Zheng,et al.  SEM+: tool for discovering concept mapping in Earth science related domain , 2015, Earth Science Informatics.

[11]  Raphaël Troncy,et al.  HDL - Towards a Harmonized Dataset Model for Open Data Portals , 2015, USEWOD-PROFILES@ESWC.

[12]  Philip S. Yu,et al.  Text Classification by Labeling Words , 2004, AAAI.

[13]  Abdullah Gani,et al.  A comprehensive survey on formal concept analysis, its research trends and applications , 2016, Int. J. Appl. Math. Comput. Sci..

[14]  Sören Auer,et al.  Linked Open Data -- Creating Knowledge Out of Interlinked Data , 2014, Lecture Notes in Computer Science.

[15]  Jonas Poelmans,et al.  Fuzzy and rough formal concept analysis: a survey , 2014, Int. J. Gen. Syst..

[16]  Jens Lehmann,et al.  DBpedia: A Nucleus for a Web of Open Data , 2007, ISWC/ASWC.

[17]  Amine Bensaid,et al.  Automatic Arabic Document Categorization Based on the Naïve Bayes Algorithm , 2004 .

[18]  Ch. Aswani Kumar,et al.  Knowledge Representation Using Formal Concept Analysis: A study on Concept Generation , 2014 .

[19]  Zhonghai Wang,et al.  Formal concept analysis and concept lattice: perspectives and challenges , 2015, Int. J. Auton. Adapt. Commun. Syst..

[20]  V Korde,et al.  TEXT CLASSIFICATION AND CLASSIFIERS: A SURVEY , 2012 .

[21]  Leonid Stoimenov,et al.  Comparative Analysis of Metadata Models on e-Government Open Data Platforms , 2018, IEEE Transactions on Emerging Topics in Computing.

[22]  Gerd Stumme,et al.  Publication Analysis of the Formal Concept Analysis Community , 2012, ICFCA.

[23]  Jürgen Umbrich,et al.  Automated Quality Assessment of Metadata across Open Data Portals , 2016, JDIQ.

[24]  Bernhard Ganter,et al.  Formal Concept Analysis: Mathematical Foundations , 1998 .

[25]  Vassilios Peristeras,et al.  Enabling Interoperability of Government Data Catalogues , 2010, EGOV.

[26]  Jonas Poelmans,et al.  Knowledge representation and processing with formal concept analysis , 2013, WIREs Data Mining Knowl. Discov..

[27]  Carlo Strapparava,et al.  Domain Kernels for Text Categorization , 2005, CoNLL.

[28]  Omkar Ardhapure,et al.  COMPARATIVE STUDY OF CLASSIFICATION ALGORITHM FOR TEXT BASED CATEGORIZATION , 2016 .

[29]  Tobias Schreck,et al.  Content-based layouts for exploratory metadata search in scientific research data , 2012, JCDL '12.

[30]  Serkan Günal,et al.  A novel probabilistic feature selection method for text classification , 2012, Knowl. Based Syst..

[31]  Flávia Bernardini,et al.  How cities categorize datasets in their open data portals: an exploratory analysis , 2018, DG.O.

[32]  Shyamanta M. Hazarika,et al.  Formal concept analysis: current trends and directions , 2013, Artificial Intelligence Review.

[33]  Ch. Aswani Kumar,et al.  A comprehensive overview on the foundations of formal concept analysis , 2017 .

[34]  David D. Lewis,et al.  A comparison of two learning algorithms for text categorization , 1994 .