Fuzzy semantic tagging and flexible querying of XML documents extracted from the Web

The relational database model is widely used in real applications. We propose a way of complementing such a database with an XML data warehouse. The approach we propose is generic, and driven by a domain ontology. The XML data warehouse is built from data extracted from the Web, which are semantically tagged using terms belonging to the domain ontology. The semantic tagging is fuzzy, since, instead of tagging the values of the Web document with one value of the domain ontology, we propose to use tags expressed in terms of a possibility distribution representing a set of possible terms, each term being weighted by a possibility degree. The querying of the XML data warehouse is also fuzzy: the end-users can express their preferences by means of fuzzy selection criteria. We present our approach on a first application domain: predictive microbiology.

[1]  Donald H. Kraft,et al.  Measurement in Information Science , 1994 .

[2]  Juliette Dibie,et al.  Fuzzy semantic annotation of XML documents , 2005 .

[3]  Ollivier Haemmerlé,et al.  Fuzzy querying of incomplete, imprecise, and heterogeneously structured data in the relational model using ontologies and rules , 2005, IEEE Transactions on Fuzzy Systems.

[4]  Henri Prade,et al.  Lipski's approach to incomplete information databases restated and generalized in the setting of Zadeh's possibility theory , 1984, Inf. Syst..

[5]  Gloria Bordogna,et al.  Modeling Vagueness in Information Retrieval , 2000, ESSIR.

[6]  Dekang Lin,et al.  An Information-Theoretic Definition of Similarity , 1998, ICML.

[7]  Gloria Bordogna,et al.  A fuzzy object oriented data model , 1994, Proceedings of 1994 IEEE 3rd International Fuzzy Systems Conference.

[8]  Michael McGill,et al.  Introduction to Modern Information Retrieval , 1983 .

[9]  Martine De Cock,et al.  Fuzzy Thesauri for and from the WWW , 2005 .

[10]  L. Zadeh Fuzzy sets as a basis for a theory of possibility , 1999 .

[11]  Gloria Bordogna,et al.  Flexible querying of WEB documents , 2002, SAC '02.

[12]  Gerard Salton,et al.  Term-Weighting Approaches in Automatic Text Retrieval , 1988, Inf. Process. Manag..

[13]  S. Miyamoto Information retrieval based on fuzzy associations , 1990 .

[14]  Patrick Bosc,et al.  SQLf: a relational database language for fuzzy querying , 1995, IEEE Trans. Fuzzy Syst..

[15]  Patrick Bosc,et al.  Soft Querying, a New Feature for Database Management Systems , 1994, DEXA.

[16]  Ronald R. Yager,et al.  On ordered weighted averaging aggregation operators in multicriteria decisionmaking , 1988, IEEE Trans. Syst. Man Cybern..

[17]  Ronald R. Yager,et al.  On ordered weighted averaging aggregation operators in multicriteria decision-making , 1988 .

[18]  Hélène Gagliardi,et al.  Enrichissement sémantique de documents XML représentant des tableaux , 2005, EGC.

[19]  Leo Egghe,et al.  Strong similarity measures for ordered sets of documents in information retrieval , 2002, Inf. Process. Manag..

[20]  Lotfi A. Zadeh,et al.  A COMPUTATIONAL APPROACH TO FUZZY QUANTIFIERS IN NATURAL LANGUAGES , 1983 .

[21]  Akhil Kumar,et al.  A dynamic warehouse for XML Data of the Web. , 2001 .

[22]  Gloria Bordogna,et al.  A fuzzy object‐oriented data model for managing vague and uncertain information , 1999 .

[23]  Didier Dubois,et al.  Possibility Theory - An Approach to Computerized Processing of Uncertainty , 1988 .

[24]  Karen Spärck Jones A statistical interpretation of term specificity and its application in retrieval , 2021, J. Documentation.