Intelligent Support for Information Retrieval of Web Documents

The main goal of this research was to investigate the means of intelligent support for retrieval of web documents. We have proposed the architecture of the web tool system --- Trillian, which discovers the interests of users without their interaction and uses them for autonomous searching of related web content. Discovered pages are suggested to the user. The discovery of user interests is based on analysis of documents visited by the users previously. We have created a module for completely transparent tracking of the user's movement on the web, which logs both visited URLs and contents of web pages. The post analysis step is based on a variant of the suffix tree clustering algorithm. We primarily focus on overall Trillian architecture design and the process of discovering topics of interests. We have implemented an experimental prototype of Trillian and evaluated the quality, speed and usefulness of the proposed system. We have shown that clustering is a feasible technique for extraction of interests from web documents. We consider the proposed architecture to be quite promising and suitable for future extensions.

[1]  Oren Etzioni,et al.  Adaptive Web Sites: an AI Challenge , 1997, IJCAI.

[2]  Yezdezard Zerxes Lashkari,et al.  Feature guided automated collaborative filtering , 1995 .

[3]  Gerard Salton,et al.  Term-Weighting Approaches in Automatic Text Retrieval , 1988, Inf. Process. Manag..

[4]  Scott Shenker,et al.  A scalable Web cache consistency architecture , 1999, SIGCOMM '99.

[5]  Oren Etzioni,et al.  Web document clustering: a feasibility demonstration , 1998, SIGIR '98.

[6]  Thorsten Joachims,et al.  WebWatcher : A Learning Apprentice for the World Wide Web , 1995 .

[7]  M. Crochemore,et al.  On-line construction of suffix trees , 2002 .

[8]  Pavol Návrat,et al.  Intelligent Support for Information Retrieval in WWW Environment , 2002, ADBIS.

[9]  James E. Pitkow,et al.  Characterizing Browsing Strategies in the World-Wide Web , 1995, Comput. Networks ISDN Syst..

[10]  Daniel S. Weld,et al.  Intelligent Agents on the Internet: Fact, Fiction, and Forecast , 1995, IEEE Expert.

[11]  Philip S. Yu,et al.  Efficient Data Mining for Path Traversal Patterns , 1998, IEEE Trans. Knowl. Data Eng..

[12]  Jaideep Srivastava,et al.  Web mining: information and pattern discovery on the World Wide Web , 1997, Proceedings Ninth IEEE International Conference on Tools with Artificial Intelligence.

[13]  Yannis Manolopoulos,et al.  Finding Generalized Path Patterns for Web Log Data Mining , 2000, ADBIS-DASFAA.

[14]  Pavol Návrat,et al.  Semantic Similarity in Content-Based Filtering , 2002, ADBIS.

[15]  David Wai-Lok Cheung,et al.  Discovering user access patterns on the World Wide Web , 1998, Knowl. Based Syst..

[16]  Pavol Návrat,et al.  Combining Content-Based and Collaborative Filtering , 2000, ADBIS-DASFAA Symposium.

[17]  Jian Pei,et al.  Mining Access Patterns Efficiently from Web Logs , 2000, PAKDD.

[18]  Gabriela Polcicova Recommending HTML-documents using Features Guided Automated Collaborative Filtering , 1999 .

[19]  Dean P. Foster,et al.  Clustering Methods for Collaborative Filtering , 1998, AAAI 1998.

[20]  Oren Etzioni,et al.  The MetaCrawler architecture for resource aggregation on the Web , 1997 .

[21]  Roy T. Fielding,et al.  Hypertext Transfer Protocol - HTTP/1.0 , 1996, RFC.

[22]  C. J. van Rijsbergen,et al.  Information Retrieval , 1979, Encyclopedia of GIS.

[23]  Alexander Dekhtyar,et al.  Information Retrieval , 2018, Lecture Notes in Computer Science.

[24]  Charles L. A. Clarke,et al.  Efficient construction of large test collections , 1998, SIGIR '98.

[25]  Oren Etzioni,et al.  Clustering web documents: a phrase-based method for grouping search engine results , 1999 .

[26]  C. Lee Giles,et al.  Inquirus, the NECI Meta Search Engine , 1998, Comput. Networks.

[27]  P. Gács,et al.  Algorithms , 1992 .

[28]  David D. Lewis,et al.  Learning in Intelligent Information Retrieval , 1991, ML.

[29]  Hongjun Lu,et al.  Cut-and-Pick Transactions for Proxy Log Mining , 2002, EDBT.

[30]  Pavol Návrat,et al.  Recommending WWW information sources using Feature Guided Automated Collaborative Filtering , 2000 .

[31]  Oren Etzioni,et al.  Fast and Intuitive Clustering of Web Documents , 1997, KDD.