Managing Web data

The Web today consists exclusively of HTML documents designed for the human eye. While many of them are generated automatically by applications, it is difficult for other applbcations to read and process them. This may soon change, due to a series of new standards frorn the World Wide Web Consortium centered around XML (Extensible Markup Language). XML is designed to express the document content, while HTML expresses its presentation. In short, XML is a data exchange format, easily understood by applications. It enables data exchange on the Web, both intra-enterprise, across platforms (intranet), and inter-enterprise (internet). The focus of the Web shifts from document management to data management, and topics like queries, views, data warehouses, mediators, which were the domain of databases, become of interest to the Web. However, the new data on the Web differs from traditional relational or object-oriented data: it is schema-less, self-describing, irregular, and heterogeneous. Recent database research has considered such data and called it semistructured darba.