Extracting Information from the Web

Extracting structured, semantically-meaningful information from the web is quite a difficult task from a programmatically point of view. The main reason is that lost documents are available in human-readable forms, but they lack a description of the structure or the semantics associated with the data they contain; furthermore, their appearance may change unexpectedly, which complicates the problem. In this arti-cle, we present a framework that relieves web agent developers from task of writing specific code to have access to the information of writing specific code to have access to the information they need from the web. This proposal achieves a complete separation between the logic an agent encapsulates and the way the information it needs is extracted, which enhances modularity, adaptability, and maintainability. It also allows to define the navigation path to the page that contains the information in which we are interested, and allows for unexpected changes to the information sources. Our approach is novel in that it combines different technologies to extract information from the web and associates semantics with it, which facilitates semantic interoperability in a multi-agent society.