Exploiting Data Semantics to Discover, Extract, and Model Web Sources

We describe Deimos, a system that automatically discovers and models new sources of information.The system exploits four core technologies developed by our group that makes an end-to-end solution to this problem possible. First, given an example source, Deimos finds other similar sources online. Second, it invokes and extracts data from these sources. Third, given the syntactic structure of a source, Deimos maps its inputs and outputs to semantic types. Finally, it infers the source's semantic definition, i.e., the function that maps the inputs to the outputs. Deimos is able to successfully automate these steps by exploiting a combination of background knowledge and data semantics. We describe the challenges in integrating separate components into a unified approach to discovering, extracting and modeling new online sources. We provide an end-to-end validation of the system in two information domains to show that it can successfully discover and model new data sources in those domains.

[1]  Kristina Lerman,et al.  Semantic Labeling of Online Information Sources , 2007, Int. J. Semantic Web Inf. Syst..

[2]  Deborah L. McGuinness,et al.  Bringing Semantics to Web Services: The OWL-S Approach , 2004, SWSWPC.

[3]  Oren Etzioni,et al.  Learning to Understand Information on the Internet: An Example-Based Approach , 1997, Journal of Intelligent Information Systems.

[4]  Craig A. Knoblock,et al.  Composing, optimizing, and executing plans for bioinformatics web services , 2005, The VLDB Journal.

[5]  Pedro M. Domingos,et al.  iMAP: discovering complex semantic matches between database schemas , 2004, SIGMOD '04.

[6]  Craig A. Knoblock,et al.  Wrapper Maintenance: A Machine Learning Approach , 2011, J. Artif. Intell. Res..

[7]  Jos de Bruijn,et al.  Web Service Modeling Ontology , 2005, Appl. Ontology.

[8]  Alon Y. Levy Logic-based techniques in data integration , 2001 .

[9]  Jun Zhang,et al.  Simlarity Search for Web Services , 2004, VLDB.

[10]  Nicholas Kushmerick,et al.  Automatically attaching semantic metadata to Web Services , 2003, IIWeb.

[11]  Laura M. Haas,et al.  Data-driven understanding and refinement of schema mappings , 2001, SIGMOD '01.

[12]  Oren Etzioni,et al.  Category Translation: Learning to Understand Information on the Internet , 1995, IJCAI.

[13]  Jeffrey D. Ullman,et al.  Information integration using logical views , 1997, Theor. Comput. Sci..

[14]  Gio Wiederhold,et al.  Mediators in the architecture of future information systems , 1992, Computer.

[15]  Steven Minton,et al.  AutoFeed: an unsupervised learning system for generating webfeeds , 2005, K-CAP '05.

[16]  Craig A. Knoblock,et al.  Learning Semantic Definitions of Online Information Sources , 2007, J. Artif. Intell. Res..

[17]  Kristina Lerman,et al.  Automatically Labeling the Inputs and Outputs of Web Services , 2006, AAAI.