Warehousing Web Resources with the WebContent Platform

We describe the WebContent platform for the management of content from the Web. The platform is based on a service-oriented architecture and Web standards (notably, Web services, XML and RDF). An enterprise service bus (following the JBI specification) and BEPL may be used to orchestrate service invocations. A peerto-peer architecture may also be used to facilitate cooperation between independent partners as well as provide scaling. We briefly describe services that were developed for supporting the main functions of the platform: acquisition, e.g., Web crawling, semantic enrichment, e.g., concept annotations, high-scale XML storage and querying (in a centralized or P2P architecture) and exploitation (including Web-based interfaces). Ontologies are pervasive in WebContent applications, supporting the description of the harvested and derived information as well as that of applications. WebContent brings together a large number of groups from industry and academia. The core of the platform is open-source. A large toolkit of both open-source and commercial services is already available. WebContent is being tested on different Web surveillance applications. In the paper, we use a strategic watch application in aeronautics that has been developed for Airbus to illustrate various aspects of the platform. WebContent is now available for research and development outside the original group of participants.

[1]  Antony I. T. Rowstron,et al.  Pastry: Scalable, Decentralized Object Location, and Routing for Large-Scale Peer-to-Peer Systems , 2001, Middleware.

[2]  Georges Gardarin,et al.  PathFinder: Indexing And Querying Xml Data in a P2P System , 2005 .

[3]  Ioana Manolescu,et al.  OptimAX: Optimizing Distributed ActiveXML Applications , 2008, 2008 Eighth International Conference on Web Engineering.

[4]  Ollivier Haemmerlé,et al.  The MIEL system: Uniform interrogation of structured and weakly-structured imprecise data , 2007, Journal of Intelligent Information Systems.

[5]  Patrice Buche,et al.  Flexible Querying of Fuzzy RDF Annotations Using Fuzzy Conceptual Graphs , 2008, ICCS.

[6]  Romaric Besançon,et al.  Concept-Based Searching and Merging for Multilingual Information Retrieval: First Experiments at CLEF 2003 , 2003, CLEF.

[7]  David Booth,et al.  Web Services Description Language (WSDL) Version 2.0 Part 0: Primer , 2007 .

[8]  David A. Ferrucci,et al.  UIMA: an architectural approach to unstructured information processing in the corporate research environment , 2004, Natural Language Engineering.

[9]  David Maier,et al.  Principles of dataspace systems , 2006, PODS '06.

[10]  Emmanuel Pietriga Semantic web data visualization with graph style sheets , 2006, SoftVis '06.

[11]  Ioana Manolescu,et al.  Constructing and querying peer-to-peer warehouses of XML resources , 2005, 21st International Conference on Data Engineering (ICDE'05).

[12]  Serge Abiteboul,et al.  The Data Ring: Community Content Sharing , 2007, CIDR.

[13]  Ollivier Haemmerlé,et al.  An Ontology-Driven Annotation of Data Tables , 2007, WISE Workshops.

[14]  Hamish Cunningham,et al.  GATE-a General Architecture for Text Engineering , 1996, COLING.

[15]  Ioana Manolescu,et al.  XML processing in DHT networks , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[16]  I. Melzer Web Services Description Language , 2010 .

[17]  Wendy Hall,et al.  The Semantic Web Revisited , 2006, IEEE Intelligent Systems.

[18]  Serge Abiteboul,et al.  The Xyleme project , 2002, Comput. Networks.

[19]  Nicolas Travers,et al.  TGV: A Tree Graph View for Modeling Untyped XQuery , 2007, DASFAA.

[20]  James A. Hendler,et al.  The Semantic Web" in Scientific American , 2001 .

[21]  Atanas Kiryakov,et al.  KIM – a semantic platform for information extraction and retrieval , 2004, Natural Language Engineering.

[22]  Stefan Decker,et al.  Creating Semantic Web Contents with Protégé-2000 , 2001, IEEE Intell. Syst..

[23]  David R. Karger,et al.  Fresnel: A Browser-Independent Presentation Vocabulary for RDF , 2005, SEMWEB.