Mix-n-Match: Building Personal Libraries from Web Content

We present an approach to web content aggregation that allows information to be harvested from web pages, independent of specific markup languages. It builds on ideas from data warehousing and we present solutions to the well-known problems of data integration, namely detection of equivalences and data cleaning, adapted to this context. We describe how the content aggregation engine has been realised as an extensible framework in such a way that end-users as well as developers can use the associated tools to create personal libaries of content extracted from the web.

[1]  Fred J. Damerau,et al.  A technique for computer detection and correction of spelling errors , 1964, CACM.

[2]  Alon Y. Halevy,et al.  Data Integration for the Relational Web , 2009, Proc. VLDB Endow..

[3]  Richard W. Hamming,et al.  Error detecting and error correcting codes , 1950 .

[4]  Vladimir I. Levenshtein,et al.  Binary codes capable of correcting deletions, insertions, and reversals , 1965 .

[5]  Mary Czerwinski,et al.  Data mountain: using spatial memory for document management , 1998, UIST '98.

[6]  Deborah Hix,et al.  TopicShop: enhanced support for evaluating and organizing collections of Web sites , 2000, UIST '00.

[7]  Rafael Berlanga Llavori,et al.  Building data warehouses with semantic data , 2010, EDBT '10.

[8]  David R. Karger,et al.  Piggy Bank: Experience the Semantic Web inside your web browser , 2005, J. Web Semant..

[9]  Mark H. Chignell,et al.  Information archiving with bookmarks: personal Web space construction and organization , 1998, CHI.

[10]  Rafael Berlanga Llavori,et al.  Integrating web feed opinions into a corporate data warehouse , 2011, BEWEB '11.

[11]  David R. Karger,et al.  Thresher: automating the unwrapping of semantic content from the World Wide Web , 2005, WWW '05.

[12]  David Salesin,et al.  Relations, cards, and search templates: user-guided web data integration and layout , 2007, UIST.

[13]  George G. Robertson,et al.  The WebBook and the Web Forager: an information workspace for the World-Wide Web , 1996, CHI.

[14]  Monica M. C. Schraefel,et al.  Hunter gatherer: interaction support for the creation and management of within-web-page collections , 2002, WWW.