Interaction techniques for automating collecting and organizing personal web content

The growth of the World Wide Web has led to a dramatic increase in accessible information. Today, people use the Web for a large variety of activities including travel planning, comparison shopping, entertainment, and research. However, the tools available for collecting, organizing, and sharing Web content have not kept pace with the rapid growth in information. Today people continue to use bookmarks, email, and printers for managing Web content. In this thesis, I present a set of semi-automatic interaction techniques for retrieving content from the Web using the structure of webpages, presentation principles based on layout templates for user-guided organization of content from any number of Web sources, and a new template-based search paradigm for the Web that transforms keyword search into a goal-oriented rich visual experience. To demonstrate the efficacy of these ideas I combined them into a working system and evaluated them through a three-month longitudinal user study. Finally, the ideas that I present in this thesis when popularized by an online Web community would allow average Web users to build a kind of machine-readable "Semantic Web" piece by piece as they go about accomplishing their personal tasks.

[1]  Rob Miller,et al.  Smart bookmarks: automatic retroactive macro recording on the web , 2007, UIST.

[2]  Joann J. Ordille,et al.  Data integration: the teenage years , 2006, VLDB.

[3]  George W. Furnas,et al.  Considerations for information environments and the NaviQue workspace , 1998, DL '98.

[4]  Torsten Suel,et al.  Interactive wrapper generation with minimal user effort , 2006, WWW '06.

[5]  Henry Lieberman,et al.  A goal-oriented web browser , 2006, CHI.

[6]  I. V. Ramakrishnan,et al.  Browsing fatigue in handhelds: semantic bookmarking spells relief , 2005, WWW '05.

[7]  Jeffrey Scott Vitter,et al.  Characterizing Web Document Change , 2001, WAIM.

[8]  Jayant Madhavan,et al.  Web-Scale Data Integration: You can afford to Pay as You Go , 2007, CIDR.

[9]  Frank M. Shipman,et al.  Beyond location: hypertext workspaces and non-linear views , 1999, HYPERTEXT '99.

[10]  Juliana Freire,et al.  Automating Web navigation with the WebVCR , 2000, Comput. Networks.

[11]  Steven F. Roth,et al.  Visage: a user interface environment for exploring information , 1996, Proceedings IEEE Symposium on Information Visualization '96.

[12]  Atsushi Sugiura,et al.  Internet scrapbook: automating Web browsing tasks by demonstration , 1998, UIST '98.

[13]  Stuart K. Card,et al.  Information foraging in information access environments , 1995, CHI '95.

[14]  David R. Karger,et al.  Thresher: automating the unwrapping of semantic content from the World Wide Web , 2005, WWW '05.

[15]  Brad A. Myers,et al.  Citrine: providing intelligent copy-and-paste , 2004, UIST '04.

[16]  David R. Karger,et al.  Piggy Bank: Experience the Semantic Web inside your web browser , 2005, J. Web Semant..

[17]  Gail E. Kaiser,et al.  DOM-based content extraction of HTML documents , 2003, WWW '03.

[18]  Kori Inkpen Quinn,et al.  Web browsing today: the impact of changing contexts on user activity , 2005, CHI EA '05.

[19]  Brad A. Myers,et al.  Creating Dynamic World Wide Web Pages By Demonstration , 1997 .

[20]  Xing Xie,et al.  Collapse-to-zoom: viewing web pages on small screen devices by interactively removing irrelevant content , 2004, UIST '04.

[21]  Yuzuru Tanaka,et al.  Clip, connect, clone: combining application elements to build custom interfaces for information access , 2004, UIST '04.

[22]  Kori Inkpen Quinn,et al.  An exploration of web-based monitoring: implications for design , 2007, CHI.

[23]  David Salesin,et al.  Changes in Webpage Structure over Time , 2007 .

[24]  S da SilvaAltigran,et al.  A brief survey of web data extraction tools , 2002 .

[25]  F.M.I.I.I. Shipman,et al.  Supporting personal collections across digital libraries in spatial hypertext , 2004, Proceedings of the 2004 Joint ACM/IEEE Conference on Digital Libraries, 2004..

[26]  eXtensible Stylesheet Language , 2009, Encyclopedia of Database Systems.

[27]  Alex Safonov Web macros by example: users managing the WWW of applications , 1999, CHI EA '99.

[28]  Peter J. Stuckey,et al.  Constraint cascading style sheets for the Web , 1999, UIST '99.

[29]  Bing Liu,et al.  Web data extraction based on partial tree alignment , 2005, WWW '05.

[30]  David R. Karger,et al.  Haystack: A Platform for Authoring End User Semantic Web Applications , 2003, WWW.

[31]  Mary Czerwinski,et al.  The Contribution of Thumbnail Image, Mouse-over Text and Spatial Location Memory to Web Page Retrieval in 3D , 1999, INTERACT.

[32]  Mathias Bauer,et al.  Instructible information agents for Web mining , 2000, IUI '00.

[33]  George G. Robertson,et al.  The WebBook and the Web Forager: an information workspace for the World-Wide Web , 1996, CHI.

[34]  Christopher Olston,et al.  What's new on the web?: the evolution of the web from a search engine perspective , 2004, WWW '04.

[35]  Saul Greenberg,et al.  Integrating back, history and bookmarks in web browsers , 2001, CHI Extended Abstracts.

[36]  Abigail Sellen,et al.  How knowledge workers use the web , 2002, CHI.

[37]  Leo Obrst,et al.  The Semantic Web: A Guide to the Future of XML, Web Services and Knowledge Management , 2003 .

[38]  I. V. Ramakrishnan,et al.  Automatic discovery of semantic structures in HTML documents , 2003, Seventh International Conference on Document Analysis and Recognition, 2003. Proceedings..

[39]  Kuo-Chung Tai,et al.  The Tree-to-Tree Correction Problem , 1979, JACM.

[40]  Kari-Jouko Räihä,et al.  The advantages of a cross-session web workspace , 2005, CHI Extended Abstracts.

[41]  Olha Bondarenko,et al.  Documents at Hand: Learning from Paper to Improve Digital Technologies , 2005, CHI.

[42]  Kenton O'Hara,et al.  A comparison of reading paper and on-line documents , 1997, CHI.

[43]  Takeo Igarashi,et al.  A negotiation architecture for fluid documents , 1998, UIST '98.

[44]  Andrew Tomkins,et al.  The volume and evolution of web page templates , 2005, WWW '05.

[45]  Ben Shneiderman,et al.  Exploring personal media: A spatial interface supporting user-defined semantic regions , 2006, J. Vis. Lang. Comput..

[46]  David Salesin,et al.  Summarizing personal web browsing sessions , 2006, UIST.

[47]  David Salesin,et al.  Experiences with Content Extraction from the Web , 2008 .

[48]  Berthier A. Ribeiro-Neto,et al.  A brief survey of web data extraction tools , 2002, SGMD.

[49]  Marc Najork,et al.  A large‐scale study of the evolution of Web pages , 2003, WWW '03.

[50]  David Salesin,et al.  Relations, cards, and search templates: user-guided web data integration and layout , 2007, UIST.

[51]  Wei-Ying Ma,et al.  Detecting web page structure for adaptive viewing on small form factor devices , 2003, WWW '03.

[52]  Andreas Paepcke,et al.  Seeing the whole in parts: text summarization for web browsing on handheld devices , 2001, WWW '01.

[53]  Mark H. Chignell,et al.  Information archiving with bookmarks: personal Web space construction and organization , 1998, CHI.

[54]  Christine Reid,et al.  The Myth of the Paperless Office , 2003, J. Documentation.

[55]  Vladimir I. Levenshtein,et al.  Binary codes capable of correcting deletions, insertions, and reversals , 1965 .

[56]  Nathanael Chambers,et al.  One-Shot Procedure Learning from Instruction and Observation , 2006, FLAIRS Conference.

[57]  Susan T. Dumais,et al.  Keeping and re-finding information on the web: What do people do and what do they need? , 2005, ASIST.

[58]  David Salesin,et al.  Adaptive layout for dynamically aggregated documents , 2008, IUI '08.

[59]  Wei-Ying Ma,et al.  Extracting Content Structure for Web Pages Based on Visual Representation , 2003, APWeb.

[60]  Ben Shneiderman,et al.  MediaFinder: an interface for dynamic personal media management with semantic regions , 2003, CHI Extended Abstracts.

[61]  Marti A. Hearst,et al.  Hierarchical faceted metadata in site search interfaces , 2002, CHI Extended Abstracts.

[62]  Pabitra Mitra,et al.  Extracting semantic structure of web documents using content and visual information , 2005, WWW '05.

[63]  Mary Czerwinski,et al.  Data mountain: using spatial memory for document management , 1998, UIST '98.

[64]  Robin Jeffries,et al.  Orienteering in an information landscape: how information seekers get from here to there , 1993, INTERCHI.

[65]  Jayant Madhavan,et al.  Reference reconciliation in complex information spaces , 2005, SIGMOD '05.

[66]  Allison Woodruff,et al.  A comparison of the use of text summaries, plain thumbnails, and enhanced thumbnails for Web search tasks , 2002, J. Assoc. Inf. Sci. Technol..

[67]  David Salesin,et al.  Adaptive document layout , 2004, CACM.

[68]  Pat Hanrahan,et al.  Polaris: A System for Query, Analysis, and Visualization of Multidimensional Relational Databases , 2002, IEEE Trans. Vis. Comput. Graph..

[69]  Rob Miller,et al.  Automation and customization of rendered web pages , 2005, UIST.

[70]  Jock D. Mackinlay,et al.  The information visualizer, an information workspace , 1991, CHI.

[71]  David R. Karger,et al.  Exhibit: lightweight structured data publishing , 2007, WWW '07.

[72]  Deborah Hix,et al.  Experiments in social data mining: The TopicShop system , 2003, TCHI.

[73]  Jing Liu,et al.  Answering Structured Queries on Unstructured Data , 2006, WebDB.

[74]  Christopher C. Yang,et al.  Fractal summarization for mobile devices to access large documents on the web , 2003, WWW '03.

[75]  Saul Greenberg,et al.  How People Recognise Previously Seen Web Pages from Titles, URLs and Thumbnails , 2001 .

[76]  Richard Mander,et al.  A “pile” metaphor for supporting casual organization of information , 1992, CHI.

[77]  Monica M. C. Schraefel,et al.  Hunter gatherer: interaction support for the creation and management of within-web-page collections , 2002, WWW.

[78]  Stuart K. Card,et al.  The cost structure of sensemaking , 1993, INTERCHI.

[79]  Patrick Baudisch,et al.  Summary thumbnails: readable overviews for small screen web browsers , 2005, CHI.

[80]  Susan T. Dumais,et al.  Once found, what then? A study of "keeping" behaviors in the personal use of Web information , 2005, ASIST.

[81]  J. Wong Marmite: Towards End-User Programming for the Web , 2007 .

[82]  Frank M. Shipman,et al.  Spatial hypertext and the practice of information triage , 1997, HYPERTEXT '97.

[83]  Bill N. Schilit,et al.  Beyond paper: supporting active reading with free form digital ink annotations , 1998, CHI.

[84]  Frank M. Shipman,et al.  Spatial hypertext: an alternative to navigational and semantic links , 1999, CSUR.