Towards Real Intelligent Web Exploration

A significant problem of the dominant web search model is the lack of a realistic way to acquire user search context. Search engines use implicit feedback, which is extremely sparse and does not allow users to properly define what they want to know, or what they think of search results. In our proposed "web exploration engine", which we implemented as a prototype, documents have been automatically pre-classified into a large number of categories representing a hierarchy of search contexts. Users can browse this structure or search within a particular category (context) by explicitly selecting it. Keyword relevance is not global but specific to a category. The main innovation we propose is the "floating" query resulting from this feature: the original search query is re-evaluated and the importance of its features re-calculated for every context the user explores. This allows users to search or browse in a truly local (context-dependent) way with a minimum of effort on their part.

[1]  Qiang Yang,et al.  Deep classification in large-scale text hierarchies , 2008, SIGIR '08.

[2]  Rajeev Motwani,et al.  The PageRank Citation Ranking : Bringing Order to the Web , 1999, WWW 1999.

[3]  Ian H. Witten,et al.  Web Dragons: Inside the Myths of Search Engine Technology , 2006 .

[4]  Stephen E. Robertson,et al.  Understanding inverse document frequency: on theoretical arguments for IDF , 2004, J. Documentation.

[5]  Richard A. Harshman,et al.  Indexing by Latent Semantic Analysis , 1990, J. Am. Soc. Inf. Sci..

[6]  Gerard Salton,et al.  The SMART Retrieval System , 1971 .

[7]  Boualem Benatallah Web Information Systems Engineering - WISE 2007, 8th International Conference on Web Information Systems Engineering, Nancy, France, December 3-7, 2007, Proceedings , 2007, WISE.

[8]  Satoshi Nakamura,et al.  WeBrowSearch: Toward Web Browser with Autonomous Search , 2007, WISE.

[9]  Abdul Sattar,et al.  Let's Trust Users It is Their Search , 2010, 2010 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology.

[10]  Jonathan A. Zdziarski,et al.  Ending Spam: Bayesian Content Filtering and the Art of Statistical Language Classification , 2005 .

[11]  Steven L. Lytinen,et al.  Concept Based Query Enhancement in the ARCH Search Agent , 2003, International Conference on Internet Computing.

[12]  Andrei Z. Broder,et al.  Sic transit gloria telae: towards an understanding of the web's decay , 2004, WWW '04.

[13]  Johannes Fürnkranz,et al.  Knowledge Discovery in Databases: PKDD 2006, 10th European Conference on Principles and Practice of Knowledge Discovery in Databases, Berlin, Germany, September 18-22, 2006, Proceedings , 2006, PKDD.

[14]  J. J. Rocchio,et al.  Relevance feedback in information retrieval , 1971 .

[15]  Marc Najork,et al.  A large‐scale study of the evolution of Web pages , 2003, WWW '03.

[16]  Abdul Sattar,et al.  Building a dynamic classifier for large text data collections , 2010, ADC.

[17]  Eibe Frank,et al.  Naive Bayes for Text Classification with Unbalanced Classes , 2006, PKDD.

[18]  Hector Garcia-Molina,et al.  Web Content Categorization Using Link Information , 2006 .