Automatic Classification and Taxonomy Generation for Semi-structured Data

The problem of data classification goes back to the definition of taxonomies covering knowledge areas. With the advent of the Web, the amount of data available increased several orders of magnitude, making manual data classification impossible. This work presents an approach based on the prototype theory to automatically classify semi-structured data, represented by frames, without any previous knowledge about structured classes. Our approach uses a variation of the K-Means algorithm that organizes a set of frames into classes, structured as a strict hierarchy.

[1]  Shiyali Ramamrita Ranganathan,et al.  Prolegomena to Library Classification , 1967 .

[2]  Harry Jones,et al.  Knowledge taxonomies A literature review , 2011 .

[3]  Andrzej Bargiela,et al.  Probabilistic Topic Models for Learning Terminological Ontologies , 2010, IEEE Transactions on Knowledge and Data Engineering.

[4]  Ali S. Hadi,et al.  Finding Groups in Data: An Introduction to Chster Analysis , 1991 .

[5]  Stefano Faralli,et al.  A Graph-Based Algorithm for Inducing Lexical Taxonomies from Scratch , 2011, IJCAI.

[6]  Marvin Minsky,et al.  A framework for representing knowledge , 1974 .

[7]  Wayne D. Gray,et al.  Basic objects in natural categories , 1976, Cognitive Psychology.

[8]  Stefano Faralli,et al.  OntoLearn Reloaded: A Graph-Based Algorithm for Taxonomy Induction , 2013, CL.

[9]  Gerhard Weikum,et al.  WWW 2007 / Track: Semantic Web Session: Ontologies ABSTRACT YAGO: A Core of Semantic Knowledge , 2022 .

[10]  Ian H. Witten,et al.  Constructing a Focused Taxonomy from a Document Collection , 2013, ESWC.

[11]  Petra Perner,et al.  Data Mining - Concepts and Techniques , 2002, Künstliche Intell..

[12]  Simone Diniz Junqueira Barbosa,et al.  Similarity and Analogy over Application Domains , 2007, SBBD.

[13]  Marti A. Hearst,et al.  Automating Creation of Hierarchical Faceted Metadata Structures , 2007, NAACL.

[14]  P. Rousseeuw Silhouettes: a graphical aid to the interpretation and validation of cluster analysis , 1987 .

[15]  Michael Strube,et al.  Distinguishing between Instances and Classes in the Wikipedia Taxonomy , 2008, ESWC.

[16]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[17]  Zornitsa Kozareva,et al.  A Semi-Supervised Method to Learn and Construct Taxonomies Using the Web , 2010, EMNLP.

[18]  Peter J. Rousseeuw,et al.  Finding Groups in Data: An Introduction to Cluster Analysis , 1990 .

[19]  Philip Chan,et al.  Determining the number of clusters/segments in hierarchical clustering/segmentation algorithms , 2004, 16th IEEE International Conference on Tools with Artificial Intelligence.