deqa: Deep Web Extraction for Question Answering

Despite decades of effort, intelligent object search remains elusive. Neither search engine nor semantic web technologies alone have managed to provide usable systems for simple questions such as "find me a flat with a garden and more than two bedrooms near a supermarket." We introduce deqa, a conceptual framework that achieves this elusive goal through combining state-of-the-art semantic technologies with effective data extraction. To that end, we apply deqa, to the UK real estate domain and show that it can answer a significant percentage of such questions correctly. deqa achieves this by mapping natural language questions to Sparql patterns. These patterns are then evaluated on an RDF database of current real estate offers. The offers are obtained using OXPath, a state-of-the-art data extraction system, on the major agencies in the Oxford area and linked through Limes to background knowledge such as the location of supermarkets.

[1]  Maria Teresa Pazienza,et al.  Semantic turkey: a browser-integrated environment for knowledge acquisition and management , 2012 .

[2]  Jens Lehmann,et al.  Class expression learning for ontology engineering , 2011, J. Web Semant..

[3]  D. Gerber,et al.  Bootstrapping the Linked Data Web , 2011 .

[4]  Bing Liu,et al.  Structured Data Extraction from the Web Based on Partial Tree Alignment , 2006, IEEE Transactions on Knowledge and Data Engineering.

[5]  Jeff Heflin,et al.  Automatically Generating Data Linkages Using a Domain-Independent Candidate Selection Approach , 2011, SEMWEB.

[6]  Sören Auer,et al.  LIMES - A Time-Efficient Approach for Large-Scale Link Discovery on the Web of Data , 2011, IJCAI.

[7]  Dhavachelvan Ponnurangam,et al.  Finite State Machine Based Evaluation Model for Web Service Reliability Analysis , 2011, ArXiv.

[8]  Jeffrey Xu Yu,et al.  Efficient similarity joins for near duplicate detection , 2008, WWW.

[9]  Khaled Shaalan,et al.  A Survey of Web Information Extraction Systems , 2006, IEEE Transactions on Knowledge and Data Engineering.

[10]  Lora Aroyo,et al.  The Semantic Web: Research and Applications , 2009, Lecture Notes in Computer Science.

[11]  Rajeev Rastogi,et al.  Web-scale information extraction with vertex , 2011, 2011 IEEE 27th International Conference on Data Engineering.

[12]  Felix Naumann,et al.  Data fusion , 2009, CSUR.

[13]  Andreas Thor,et al.  Comparative evaluation of entity resolution approaches with FEVER , 2009, Proc. VLDB Endow..

[14]  Enrico Motta,et al.  Unsupervised Learning of Link Discovery Configuration , 2012, ESWC.

[15]  Jens Lehmann,et al.  Universal OWL Axiom Enrichment for Large Knowledge Bases , 2012, EKAW.

[16]  Khaled Shaalan,et al.  FiVaTech: Page-Level Web Data Extraction from Template Pages , 2007 .

[17]  James A. Hendler,et al.  The Semantic Web" in Scientific American , 2001 .

[18]  Jens Lehmann,et al.  Template-based question answering over RDF data , 2012, WWW.

[19]  Tim Furche,et al.  OXPath: little language, little memory, great value , 2011, WWW.

[20]  Philipp Cimiano,et al.  Pythia: Compositional Meaning Construction for Ontology-Based Question Answering on the Semantic Web , 2011, NLDB.

[21]  Oren Etzioni,et al.  Open Information Extraction from the Web , 2007, CACM.

[22]  Lora Aroyo,et al.  The Semantic Web - ISWC 2011 - 10th International Semantic Web Conference, Bonn, Germany, October 23-27, 2011, Proceedings, Part I , 2011, SEMWEB.

[23]  Christan Earl Grant,et al.  Morpheus: a deep web question answering system , 2010, iiWAS.

[24]  Jens Lehmann,et al.  RAVEN - active learning of link specifications , 2011, OM.

[25]  Axel-Cyrille Ngonga Ngomo,et al.  EAGLE: Efficient Active Learning of Link Specifications Using Genetic Programming , 2012, ESWC.

[26]  Tim Furche,et al.  OXPath , 2011, Proc. VLDB Endow..

[27]  Enrico Motta,et al.  Is Question Answering fit for the Semantic Web?: A survey , 2011, Semantic Web.

[28]  Doug Downey,et al.  Unsupervised named-entity extraction from the Web: An experimental study , 2005, Artif. Intell..

[29]  Brian Davis,et al.  Knowledge Engineering and Knowledge Management , 2012, Lecture Notes in Computer Science.

[30]  Enrico Motta,et al.  Integration of micro-gravity and geodetic data to constrain shallow system mass changes at Krafla Volcano, N Iceland , 2006 .

[31]  Jimmy J. Lin The Web as a Resource for Question Answering: Perspectives and Challenges , 2002, LREC.

[32]  Diego Mollá Aliod,et al.  Question Answering in Restricted Domains: An Overview , 2007, CL.

[33]  Jens Lehmann,et al.  DBpedia: A Nucleus for a Web of Open Data , 2007, ISWC/ASWC.

[34]  Axel-Cyrille Ngonga Ngomo,et al.  A time-efficient hybrid approach to link discovery , 2011, OM.

[35]  Tim Furche,et al.  Visual oXPath: robust wrapping by example , 2012, WWW.

[36]  Christian Bizer,et al.  The R2R Framework: Publishing and Discovering Mappings on the Web , 2010, COLD.