Facet Annotation Using Reference Knowledge Bases

Faceted interfaces are omnipresent on the web to support data exploration and filtering. A facet is a triple: a domain (e.g., Book), a property (e.g., author, language), and a set of property values (e.g., Austen, Beauvoir, Coelho, Dostoevsky, Eco, Kerouac, Suskind, ..., French, English, German, Italian, Portuguese, Russian, ... ). Given a property (e.g., language), selecting one or more of its values (English and Italian) returns the domain entities (of type Book) that match the given values (the books that are written in English or Italian). To implement faceted interfaces in a way that is scalable to very large datasets, it is necessary to automate facet extraction. Prior work associates a facet domain with a set of homogeneous values, but does not annotate the facet property. In this paper, we annotate the facet property with a predicate from a reference Knowledge Base (KB) so as to maximize the semantic similarity between the property and the predicate. We define semantic similarity in terms of three new metrics: specificity, coverage, and frequency. Our experimental evaluation uses the DBpedia and YAGO KBs and shows that for the facet annotation problem, we obtain better results than a state-of-the-art approach for the annotation of web tables as modified to annotate a set of values.

[1]  Ian H. Witten,et al.  Constructing a Focused Taxonomy from a Document Collection , 2013, ESWC.

[2]  Timothy W. Finin,et al.  Semantic Message Passing for Generating Linked Data from Tables , 1999, SEMWEB.

[3]  Praveen Paritosh,et al.  Freebase: a collaboratively created graph database for structuring human knowledge , 2008, SIGMOD Conference.

[4]  Andrea Maurino,et al.  ABSTAT: Ontology-driven Linked Data Summaries with Pattern Minimalization , 2016, SumPre@ESWC.

[5]  Carlo Batini,et al.  Extracting Facets from Lost Fine-Grained Categorizations in Dataspaces , 2014, CAiSE.

[6]  Uta Priss,et al.  Facet-like Structures in Computer Science , 2008 .

[7]  Wei Shen,et al.  LIEGE:: link entities in web lists with knowledge base , 2012, KDD.

[8]  Craig A. Knoblock,et al.  Semantic Labeling: A Domain-Independent Approach , 2016, SEMWEB.

[9]  Qinghua Zheng,et al.  A Survey of Faceted Search , 2013, J. Web Eng..

[10]  Panagiotis G. Ipeirotis,et al.  Automatic Extraction of Useful Facet Hierarchies from Text Databases , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[11]  Jens Lehmann,et al.  DBpedia - A large-scale, multilingual knowledge base extracted from Wikipedia , 2015, Semantic Web.

[12]  Qinghua Zheng,et al.  DFT-extractor: a system to extract domain-specific faceted taxonomies from wikipedia , 2013, WWW.

[13]  Kathryn La Barre,et al.  Facet analysis , 2010, Annu. Rev. Inf. Sci. Technol..

[14]  Haixun Wang,et al.  Probase: a probabilistic taxonomy for text understanding , 2012, SIGMOD Conference.

[15]  James Allan,et al.  Extracting query facets from search results , 2013, SIGIR.

[16]  Michael Granitzer,et al.  Towards Disambiguating Web Tables , 2013, SEMWEB.

[17]  Jun Rao,et al.  Dynamic faceted search for discovery-driven analysis , 2008, CIKM '08.

[18]  Flavius Frasincar,et al.  Faceted product search powered by the Semantic Web , 2012, Decis. Support Syst..

[19]  Martin Hepp,et al.  E-Business Vocabularies as a Moving Target: Quantifying the Conceptual Dynamics in Domains , 2008, EKAW.

[20]  Gerhard Weikum,et al.  WWW 2007 / Track: Semantic Web Session: Ontologies ABSTRACT YAGO: A Core of Semantic Knowledge , 2022 .

[21]  James Allan,et al.  Precision-Oriented Query Facet Extraction , 2016, CIKM.

[22]  Gianluca Quercini,et al.  Entity discovery and annotation in tables , 2013, EDBT '13.

[23]  Haixun Wang,et al.  Understanding Tables on the Web , 2012, ER.

[24]  Jérôme Euzenat,et al.  Ontology Matching: State of the Art and Future Challenges , 2013, IEEE Transactions on Knowledge and Data Engineering.

[25]  Roberto Navigli,et al.  Entity Linking meets Word Sense Disambiguation: a Unified Approach , 2014, TACL.

[26]  Felix Naumann,et al.  Reconciling ontologies and the web of data , 2012, CIKM.

[27]  Ji-Rong Wen,et al.  Finding dimensions for queries , 2011, CIKM '11.

[28]  Ivan Herman,et al.  RDFa Core 1.1 - Third Edition (Japanese translation), W3C Recommendation , 2018 .

[29]  Gautam Das,et al.  Facetedpedia: enabling query-dependent faceted search for wikipedia , 2010, CIKM '10.

[30]  Ansgar Scherp,et al.  Survey on Common Strategies of Vocabulary Reuse in Linked Open Data Modeling , 2014, ESWC.

[31]  Isabel F. Cruz,et al.  Semantic extraction of geographic data from web tables for big data integration , 2013, GIR '13.

[32]  Emanuel Santos,et al.  The AgreementMakerLight Ontology Matching System , 2013, OTM Conferences.

[33]  Dominique Ritze,et al.  Matching Web Tables To DBpedia - A Feature Utility Study , 2017, EDBT.

[34]  Vasilis Efthymiou,et al.  Matching Web Tables with Knowledge Base Entities: From Entity Lookups to Entity Embeddings , 2017, SEMWEB.

[35]  François Yvon,et al.  Robust Similarity Measures for Named Entities Matching , 2008, COLING.

[36]  Sunita Sarawagi,et al.  Annotating and searching web tables using entities, types and relationships , 2010, Proc. VLDB Endow..

[37]  Dominique Ritze,et al.  Matching HTML Tables to DBpedia , 2015, WIMS.

[38]  Cosmin Stroe,et al.  AgreementMaker: Efficient Matching for Large Real-World Schemas and Ontologies , 2009, Proc. VLDB Endow..

[39]  Martin Hepp,et al.  Adaptive Faceted Search for Product Comparison on the Web of Data , 2015, ICWE.

[40]  Jayant Madhavan,et al.  Recovering Semantics of Tables on the Web , 2011, Proc. VLDB Endow..

[41]  Ziqi Zhang,et al.  Effective and efficient Semantic Table Interpretation using TableMiner+ , 2017, Semantic Web.