Probabilistic Web Data Management

With the development of Web 2.0 technology, enormous data are generated every day. Among these data, there exist quite a lot uncertainty due to careless data entry, incomplete information, and inconsistency among different data description. Although significant effort has been paid to find effective and efficient solutions for managing and mining general uncertain data, little attention is paid to manage uncertain data on the Web. This special issue is proposed to attract research attempts on handling uncertainty of the Web data. This special issue has attracted 12 submissions, after two rounds of very careful reviews by domain experts, we accepted three excellent papers. These three papers present new ideas to address issues on Probabilistic Web Data Management. The first paper, “an efficient approach to suggesting topically related Web queries using hidden topic model”, authored by Lin Li, Guandong Xu, Zhenglu Yang, Peter Dolog, Yanchun Zhang and Masaru Kitsuregawa, is about handling the uncertainty of queries on the Web [3]. Often, it is hard for users to formulate an appropriate query. The best strategy is to suggest Web queries that related to the initial inquiry. However, there are quite some uncertainty existed on measuring the similarity between queries. In this paper, the authors proposed a solution based on the hidden topics. Specifically, they first built a hidden topic model, and then the trained model was used to infer the topic distribution of the newly input query. The query similarity was measured through the topic distributions, based on which, a suggestion list for the candidate queries was computed. The experimental study verified the effectiveness of the proposed approach in measuring the similarity between queries. The second paper, “Efficient processing of top-k twig queries over probabilistic XML data”, authored by Bo Ning, Chengfei Liu, and Jeffrey Xu Yu, addressed the twig query processing over probabilistic XML data [1]. Compared to certain XML data, each twig answer in probabilistic XML data is associated with a probabilistic value. Since the solutions for certain XML data do not consider uncertainty, they cannot be used for queries over probabilistic XML data. In this paper, the authors proposed a new encoding scheme, called World Wide Web (2013) 16:271–272 DOI 10.1007/s11280-013-0205-9