Provenance-Aware Semantic Search Engines Based On Data Integration Systems

Search engines are common tools for virtually every user of the Internet and companies, such as Google and Yahoo!, have become household names. Semantic Search Engines try to augment and improve traditional Web Search Engines by using not just words, but concepts and logical relationships. Given the openness of the Web and the different sources involved, a Web Search Engine must evaluate quality and trustworthiness of the data; a common approach for such assessments is the analysis of the provenance of information. In this paper a relevant class of Provenance-aware Semantic Search Engines, based on a peer-to-peer, data integration mediator-based architecture is described. The architectural and functional features are an enhancement with provenance of the SEWASIE semantic search engine developed within the IST EU SEWASIE project, coordinated by the authors. The methodology to create a two level ontology and the query processing engine developed within the SEWASIE project, together with provenance extension are fully described.

[1]  Rudi Studer,et al.  Editorial - Special Issue Semantic Search , 2011 .

[2]  Jeffrey D. Ullman,et al.  Integrating information by outerjoins and full disjunctions (extended abstract) , 1996, PODS.

[3]  Sonia Bergamaschi,et al.  Schema label normalization for improving schema matching , 2010, Data Knowl. Eng..

[4]  Valter Crescenzi,et al.  RoadRunner: Towards Automatic Data Extraction from Large Web Sites , 2001, VLDB.

[5]  Hai Jin,et al.  Efficient search for peer-to-peer information retrieval using semantic small world , 2006, WWW '06.

[6]  Silvana Castano,et al.  Semantic integration of heterogeneous information sources , 2001, Data Knowl. Eng..

[7]  Andrea Calì,et al.  Data integration under integrity constraints , 2004, Inf. Syst..

[8]  Nicolás Marín,et al.  Review of Data on the Web: from relational to semistructured data and XML by Serge Abiteboul, Peter Buneman, and Dan Suciu. Morgan Kaufmann 1999. , 2003, SGMD.

[9]  Jeffrey D. Ullman,et al.  Information integration using logical views , 1997, Theor. Comput. Sci..

[10]  Diomidis Spinellis,et al.  A survey of peer-to-peer content distribution technologies , 2004, CSUR.

[11]  Sonia Bergamaschi,et al.  A Hidden Markov Model Approach to Keyword-Based Search over Relational Databases , 2011, ER.

[12]  Yoshiharu Ishikawa,et al.  Query Processing in a Traceable P2P Record Exchange Framework , 2010, IEICE Trans. Inf. Syst..

[13]  Margo I. Seltzer,et al.  Provenance: a future history , 2009, OOPSLA Companion.

[14]  Steffen Staab,et al.  A Metadata Model for Semantics-Based Peer-to-Peer Systems , 2003 .

[15]  Maurizio Lenzerini,et al.  Data integration: a theoretical perspective , 2002, PODS.

[16]  Olaf Hartig,et al.  Using Web Data Provenance for Quality Assessment , 2009, SWPM.

[17]  Maurizio Vincini,et al.  Building an integrated Ontology within SEWASIE system , 2003, SWDB.

[18]  Felix Naumann,et al.  Declarative Data Merging with Conflict Resolution , 2002, ICIQ.

[19]  Dieter Fensel,et al.  Ontobroker: Ontology Based Access to Distributed and Semi-Structured Information , 1999, DS-8.

[20]  Karl Aberer,et al.  Towards a Common Framework for Peer-to-Peer Web Retrieval , 2005, From Integrated Publication and Information Systems to Virtual Information and Knowledge Environments.

[21]  Paul T. Groth,et al.  Using provenance in the Semantic Web , 2011, J. Web Semant..

[22]  Mikhail R. Kogalovsky Ontology-based data access systems , 2012, Programming and Computer Software.

[23]  Karl Aberer,et al.  The chatty web: emergent semantics through gossiping , 2003, WWW '03.

[24]  Sonia Bergamaschi,et al.  Keyword search over relational databases: a metadata approach , 2011, SIGMOD '11.

[25]  Grigoris Karvounarakis Foundations and Applications of Data Provenance , 2013 .

[26]  Jennifer Widom,et al.  An Introduction to ULDBs and the Trio System , 2006, IEEE Data Eng. Bull..

[27]  Joann J. Ordille,et al.  Data integration: the teenage years , 2006, VLDB.

[28]  Sonia Bergamaschi,et al.  Keymantic: Semantic Keyword-based Searching in Data Integration Systems , 2010, Proc. VLDB Endow..

[29]  Jennifer Widom,et al.  Lineage tracing for general data warehouse transformations , 2003, The VLDB Journal.

[30]  Domenico Beneventano,et al.  Consistency Checking in Complex Object Database Schemata with Integrity Constraints , 1995, DBPL.

[31]  Donald Kossmann,et al.  The state of the art in distributed query processing , 2000, CSUR.

[32]  Georg Gottlob,et al.  Visual Web Information Extraction with Lixto , 2001, VLDB.

[33]  Domenico Beneventano,et al.  Data lineage in the MOMIS data fusion system , 2011, 2011 IEEE 27th International Conference on Data Engineering Workshops.

[34]  Sergio Tessaris,et al.  A Multi-Agent System for Querying Heterogeneous Data Sources with Ontologies , 2005, SEBD.

[35]  George A. Miller,et al.  WordNet: A Lexical Database for English , 1995, HLT.

[36]  Rajeev Motwani,et al.  Robust and efficient fuzzy match for online data cleaning , 2003, SIGMOD '03.

[37]  Gustavo Alonso,et al.  Perm: Processing Provenance and Data on the Same Data Model through Query Rewriting , 2009, 2009 IEEE 25th International Conference on Data Engineering.

[38]  Alon Y. Halevy,et al.  Answering queries using views: A survey , 2001, The VLDB Journal.

[39]  Partha Pratim Talukdar,et al.  The ORCHESTRA Collaborative Data Sharing System , 2008, SIGMOD Rec..

[40]  Enrico Franconi,et al.  The i.com tool for Intelligent Conceptual Modeling , 2000, KRDB.

[41]  Deborah L. McGuinness,et al.  Provenance-Based Strategies to Develop Trust in Semantic Web Applications , 2010, IPAW.

[42]  Sergio Greco,et al.  A Logical Framework for Querying and Repairing Inconsistent Databases , 2003, IEEE Trans. Knowl. Data Eng..

[43]  Dan Suciu,et al.  The Piazza peer data management system , 2004, IEEE Transactions on Knowledge and Data Engineering.

[44]  Jussi Myllymaki Effective Web data extraction with standard XML technologies , 2002, Comput. Networks.

[45]  César A. Galindo-Legaria,et al.  Outerjoins as disjunctions , 1994, SIGMOD '94.

[46]  Jayant Madhavan,et al.  Composing Mappings Among Data Sources , 2003, VLDB.

[47]  Li Gong,et al.  Industry Report: JXTA: A Network Programming Environment , 2001, IEEE Internet Comput..

[48]  Wang Chiew Tan Provenance in Databases: Past, Current, and Future , 2007, IEEE Data Eng. Bull..

[49]  Craig A. Knoblock,et al.  Learning object identification rules for information integration , 2001, Inf. Syst..

[50]  Maurizio Vincini,et al.  The SEWASIE Network of Mediator Agents for Semantic Search , 2007, J. Univers. Comput. Sci..

[51]  Diego Calvanese,et al.  What to Ask to a Peer: Ontolgoy-based Query Reformulation , 2004, KR.

[52]  Gio Wiederhold,et al.  Intelligent integration of information , 1993, Springer US.

[53]  Silvana Castano,et al.  Ontology-Addressable Contents in P2P Networks , 2003 .

[54]  Adriane Chapman,et al.  It's About the Data: Provenance as a Tool for Assessing Data Fitness , 2012, TaPP.

[55]  Ana Carolina Salgado,et al.  Ontology-Based Clustering in a Peer Data Management System , 2012, Int. J. Distributed Syst. Technol..

[56]  Jan Chomicki,et al.  Query Answering in Inconsistent Databases , 2003, Logics for Emerging Applications of Databases.

[57]  Maurizio Lenzerini,et al.  Tackling inconsistencies in data integration through source preferences , 2004, IQIS '04.

[58]  Jeff Heflin,et al.  Searching the Web with SHOE , 2000 .

[59]  Sonia Bergamaschi,et al.  NORMS: An automatic tool to perform schema label normalization , 2011, 2011 IEEE 27th International Conference on Data Engineering.

[60]  Wray L. Buntine,et al.  ALVIS: Superpeer Semantic Search Engine , 2004, EWIMT.

[61]  Felix Naumann,et al.  Semantic Overlay Clusters within Super-Peer Networks , 2003, DBISP2P.

[62]  Jennifer Golbeck,et al.  Trust on the World Wide Web: A Survey , 2006, Found. Trends Web Sci..

[63]  Alberto O. Mendelzon,et al.  Merging Databases Under Constraints , 1998, Int. J. Cooperative Inf. Syst..

[64]  Tiziana Catarci,et al.  An Ontology Based Visual Tool for Query Formulation Support , 2004, OTM Workshops.

[65]  Domenico Beneventano,et al.  On Provenance of Data Fusion Queries , 2011, SEBD.

[66]  Maurizio Vincini,et al.  Synthesizing an Integrated Ontology , 2003, IEEE Internet Comput..

[67]  Xiaozhou Li,et al.  Efficient querying and maintenance of network provenance at internet-scale , 2010, SIGMOD Conference.

[68]  Jürgen Umbrich,et al.  Searching and browsing Linked Data with SWSE: The Semantic Web Search Engine , 2011, J. Web Semant..

[69]  Domenico Beneventano,et al.  Provenance Based Conflict Handling Strategies , 2012, DASFAA Workshops.

[70]  Hector Garcia-Molina,et al.  Designing a super-peer network , 2003, Proceedings 19th International Conference on Data Engineering (Cat. No.03CH37405).

[71]  Alon Y. Halevy,et al.  An adaptive query execution system for data integration , 1999, SIGMOD '99.

[72]  G. Höfner,et al.  Data integration , 1993 .

[73]  Wolfgang Nejdl,et al.  Information Integration in Schema-Based Peer-To-Peer Networks , 2003, CAiSE.

[74]  Francesco Guerra,et al.  Aggregated search of data and services , 2011, Inf. Syst..