Encoding classifications into lightweight ontologies

Classifications have been used for centuries with the goal of cataloguing and searching large sets of objects. In the early days it was mainly books; lately it has also become Web pages, pictures and any kind of digital resources. Classifications describe their contents using natural language labels, an approach which has proved very effective in manual classification. However natural language labels show their limitations when one tries to automate the process, as they make it very hard to reason about classifications and their contents. In this paper we introduce the novel notion of Formal Classification, as a graph structure where labels are written in a propositional concept language. Formal Classifications turn out to be some form of lightweight ontologies. This, in turn, allows us to reason about them, to associate to each node a normal form formula which univocally describes its contents, and to reduce document classification and query answering to reasoning about subsumption.

[1]  Anil K. Jain,et al.  Data clustering: a review , 1999, CSUR.

[2]  Diego Calvanese,et al.  The Description Logic Handbook: Theory, Implementation, and Applications , 2003, Description Logic Handbook.

[3]  R. Wille Concept lattices and conceptual knowledge systems , 1992 .

[4]  Fausto Giunchiglia,et al.  Semantic Matching: Algorithms and Implementation , 2007, J. Data Semant..

[5]  Fausto Giunchiglia,et al.  S-Match: an Algorithm and an Implementation of Semantic Matching , 2004, ESWS.

[6]  Fausto Giunchiglia,et al.  Towards Explaining Semantic Matching , 2004, Description Logics.

[7]  Diego Sona,et al.  Clustering documents in a web directory , 2003, WIDM '03.

[8]  I. Horrocks,et al.  The Instance Store: DL Reasoning with Large Numbers of Individuals , 2004, Description Logics.

[9]  John F. Sowa,et al.  Conceptual Structures: Information Processing in Mind and Machine , 1983 .

[10]  Douglas B. Lenat,et al.  CYC: a large-scale investment in knowledge infrastructure , 1995, CACM.

[11]  Michael Uschold,et al.  Ontologies and semantics for seamless connectivity , 2004, SGMD.

[12]  Luciano Serafini,et al.  Matching Hierarchical Classifications with Attributes , 2006, ESWC.

[13]  Fausto Giunchiglia,et al.  Element Level Semantic Matching , 2004 .

[14]  Christiane Fellbaum,et al.  Book Reviews: WordNet: An Electronic Lexical Database , 1999, CL.

[15]  Fausto Giunchiglia,et al.  Efficient Semantic Matching , 2005, ESWC.

[16]  Ee-Peng Lim,et al.  Hierarchical text classification and evaluation , 2001, Proceedings 2001 IEEE International Conference on Data Mining.

[17]  Daphne Koller,et al.  Hierarchically Classifying Documents Using Very Few Words , 1997, ICML.

[18]  Fabrizio Sebastiani,et al.  Machine learning in automated text categorization , 2001, CSUR.

[19]  Guus Schreiber,et al.  The Semantic Web – ISWC 2004 , 2004, Lecture Notes in Computer Science.

[20]  Natalya F. Noy,et al.  Semantic integration: a survey of ontology-based approaches , 2004, SGMD.

[21]  Luciano Serafini,et al.  Making Explicit the Semantics Hidden in Schema Models , 2003 .

[22]  P. Johnson-Laird Mental models , 1989 .

[23]  Luciano Serafini,et al.  Semantic Coordination: A New Approach and an Application , 2003, SEMWEB.

[24]  Fausto Giunchiglia,et al.  Semantic Schema Matching , 2005, OTM Conferences.

[25]  Sebastian Thrun,et al.  Text Classification from Labeled and Unlabeled Documents using EM , 2000, Machine Learning.