Reasoning and Ontologies in Data Extraction

The web has become a pig sty—everyone dumps information at random places and in random shapes. Try to find the cheapest apartment in Oxford considering rent, travel, tax and heating costs; or a cheap, reasonable reviewed 11” laptop with an SSD drive.

[1]  Georg Gottlob,et al.  Visual Web Information Extraction with Lixto , 2001, VLDB.

[2]  Jeffrey F. Naughton,et al.  Declarative Information Extraction Using Datalog with Embedded Extraction Predicates , 2007, VLDB.

[3]  Eben M. Haber,et al.  CoScripter: automating & sharing how-to knowledge in the enterprise , 2008, CHI.

[4]  Gustavo Rossi,et al.  Web Engineering , 2001, Lecture Notes in Computer Science.

[5]  Georg Gottlob,et al.  The Elog Web Extraction Language , 2001, LPAR.

[6]  Weiyi Meng,et al.  Vision-based Web Data Records Extraction , 2006, WebDB.

[7]  Frank Wolter,et al.  Monodic fragments of first-order temporal logics: 2000-2001 A.D , 2001, LPAR.

[8]  Jeffrey Nichols,et al.  End-user programming of mashups with vegemite , 2009, IUI.

[9]  Massimo Ruffolo,et al.  XONTO: An Ontology-Based System for Semantic Information Extraction from PDF Documents , 2008, 2008 20th IEEE International Conference on Tools with Artificial Intelligence.

[10]  David W. Embley,et al.  Conceptual-Model-Based Data Extraction from Multiple-Record Web Pages , 1999, Data Knowl. Eng..

[11]  Alberto O. Mendelzon,et al.  WebOQL: restructuring documents, databases, and webs , 1999 .

[12]  Tim Furche,et al.  Real understanding of real estate forms , 2011, WIMS '11.

[13]  Tim Furche,et al.  OPAL: automated form understanding for the deep web , 2012, WWW.

[14]  Steffen Staab,et al.  SXPath - Extending XPath towards Spatial Querying on Web Documents , 2010, Proc. VLDB Endow..

[15]  Hiroyuki Kitagawa,et al.  Wraplet: Wrapping Your Web Contents with a Lightweight Language , 2007, 2007 Third International IEEE Conference on Signal-Image Technologies and Internet-Based System.

[16]  Rajeev Rastogi,et al.  Web information extraction using markov logic networks , 2011, KDD.

[17]  Jian Pei,et al.  Can we learn a template-independent wrapper for news article extraction from a single training site? , 2009, KDD.

[18]  Arnaud Sahuguet,et al.  Building Light-Weight Wrappers for Legacy Web Data-Sources Using W4F , 1999, VLDB.

[19]  Jayant Madhavan,et al.  Web-Scale Data Integration: You can afford to Pay as You Go , 2007, CIDR.

[20]  Tok Wang Ling,et al.  A rule-based query language for HTML , 2001, Proceedings Seventh International Conference on Database Systems for Advanced Applications. DASFAA 2001.

[21]  Cristina Sirangelo,et al.  Reasoning About Pattern-Based XML Queries , 2013, RR.

[22]  Khaled Shaalan,et al.  FiVaTech: Page-Level Web Data Extraction from Template Pages , 2007 .

[23]  Oren Etzioni,et al.  TextRunner: Open Information Extraction on the Web , 2007, NAACL.

[24]  Georg Gottlob,et al.  Determining relevance of accesses at runtime , 2011, PODS.

[25]  Maarten Marx,et al.  Conditional XPath, the first order complete XPath dialect , 2004, PODS.

[26]  Georg Lausen,et al.  ViPER: augmenting automatic information extraction with visual perceptions , 2005, CIKM '05.

[27]  Jochen Renz,et al.  Qualitative Spatial Reasoning with Topological Information , 2002, Lecture Notes in Computer Science.

[28]  Andrea Tagarelli,et al.  Schema-Based Web Wrapping , 2004, ER.

[29]  Nick Koudas,et al.  The design of a query monitoring system , 2009, TODS.

[30]  Ralph Grishman,et al.  Message Understanding Conference- 6: A Brief History , 1996, COLING.

[31]  Pasquale De Meo,et al.  Web Data Extraction , Applications and Techniques : A Survey , 2010 .

[32]  Tim Furche,et al.  Little Knowledge Rules the Web: Domain-Centric Result Page Extraction , 2011, RR.

[33]  Weifeng Su,et al.  ODE: Ontology-assisted data extraction , 2009, TODS.

[34]  Ravi Kumar,et al.  Automatic Wrappers for Large Scale Web Extraction , 2011, Proc. VLDB Endow..

[35]  Tim Furche,et al.  OXPath , 2011, Proc. VLDB Endow..

[36]  Andrea Calì,et al.  Query Answering under Non-guarded Rules in Datalog+/- , 2010, RR.

[37]  Rajeev Rastogi,et al.  Exploiting content redundancy for web information extraction , 2010, WWW '10.

[38]  Guido Sciavicco,et al.  Spatial Reasoning with Rectangular Cardinal Direction Relations 1 , 2006 .

[39]  Tim Furche,et al.  How the Minotaur Turned into Ariadne: Ontologies in Web Data Extraction , 2011, ICWE.

[40]  Rob Miller,et al.  Automation and customization of rendered web pages , 2005, UIST.

[41]  Dejing Dou,et al.  Ontology-based information extraction: An introduction and a survey of current approaches , 2010, J. Inf. Sci..

[42]  Ashwin Machanavajjhala,et al.  An Analysis of Structured Data on the Web , 2012, Proc. VLDB Endow..

[43]  Khaled Shaalan,et al.  A Survey of Web Information Extraction Systems , 2006, IEEE Transactions on Knowledge and Data Engineering.

[44]  Bing Liu,et al.  Structured Data Extraction from the Web Based on Partial Tree Alignment , 2006, IEEE Transactions on Knowledge and Data Engineering.

[45]  Sunita Sarawagi,et al.  Information Extraction , 2008 .

[46]  Lorenzo Blanco,et al.  Exploiting information redundancy to wring out structured data from the web , 2010, WWW '10.

[47]  Alberto O. Mendelzon,et al.  Querying the World Wide Web , 1996, Fourth International Conference on Parallel and Distributed Information Systems.

[48]  Alon Y. Halevy Structured Data on the Web , 2009, NGITS.

[49]  Pierre Senellart,et al.  Automatic wrapper induction from hidden-web sources with domain knowledge , 2008, WIDM '08.

[50]  Estevam R. Hruschka,et al.  Toward an Architecture for Never-Ending Language Learning , 2010, AAAI.

[51]  Tim Furche,et al.  Turn the Page: Automated Traversal of Paginated Websites , 2012, ICWE.

[52]  Domenico Saccà,et al.  Ontology-Based Information Extraction from PDF Documents with Xonto , 2009, Int. J. Artif. Intell. Tools.