Induction-based approach to personalized search engines

In a document retrieval system where data is stored and compared with a specific query and then compared with other documents, we need to find the document that is most similar to the query. The most similar document will have the weight higher than other documents. When more than one document are proposed to the user, these documents have to be sorted according to their weights. Once the result is presented to the user by a recommender system, the user may check any document of interest. If there are two different documents' lists, as two proposed results presented by different recommender systems, then, there is a need to find which list is more efficient. To do so, the measuring tool "Search Engine Ranking Efficiency Evaluation Tool [SEREET]" came to existence. This tool assesses the efficiency of each documents list and assigns a numerical value to the list. The value will be closer to 100% if the ranking list efficiency is high which means more relevance documents exist in the list and documents are sorted according to their relevance to the user. The value will be closer to 0% when the ranking list efficiency is poor and all of the presented documents are uninteresting documents to the user. A model to evaluate ranking efficiency is proposed in the dissertation, then it is proved it mathematically. Many mechanisms of search engine have been proposed in order to assess the relevance of a web page. They have focused on keyword frequency, page usage, link analysis and various combinations of them. These methods have been tested and used to provide the user with the most interesting web pages, according to his or her preferences. The collaborative filtering is a new approach, which was developed in this dissertation to retrieve the most interesting documents to the user according to his or her interests. Building a user profile is a very important issue in finding the user interest and categorizes each user in a suitable category. This is a requirement in collaborative filtering implementation. The inference tools such as time spent in a web page, mouse movement, page scrolling, mouse clicks and other tools were investigated. Then the dissertation shows that the most efficient and sufficient tool is the time a user spent on a web page. To eliminate errors, the system introduces a low threshold and high threshold for each user. Once the time spent on a web page breaks this threshold, an error is reported. SEREET tool is one of the contributions to the scientific society, which measures the efficiency of a search engine ranking list. Considerable work were carried, then the conclusion was that the amount of time spent on a web page is the most important factor in determining a user interest of a web page and also it is a sufficient tool which does not require collaborations from other tools such as mouse movements or a page scrolling. The results show that implicit rating is a satisfactory measure and can replace explicit rating. New filtering technique was introduced to design a fully functional recommender system. The linear vector algorithm which was introduced improves the vector space algorithm (VSA) in time complexity and efficiency. The use of machine learning enhances the retrieved list efficiency. Machine learning algorithm uses positive and negative examples for the training, these examples are mandatory to improve the error rate of the system. The result shows that the amount of these examples increases proportionally with the error rate of the system.

[1]  Henry Lieberman,et al.  Letizia: An Agent That Assists Web Browsing , 1995, IJCAI.

[2]  Loren G. Terveen,et al.  Using frequency-of-mention in public conversations for social filtering , 1996, CSCW '96.

[3]  Naoki Abe,et al.  Collaborative Filtering Using Weighted Majority Prediction Algorithms , 1998, ICML.

[4]  Ryen W. White,et al.  Finding relevant documents using top ranking sentences: an evaluation of two alternative schemes , 2002, SIGIR '02.

[5]  Philip K. Chan,et al.  Learning implicit user interest hierarchy for context in personalization , 2008, IUI '03.

[6]  Bradley N. Miller,et al.  Using filtering agents to improve prediction quality in the GroupLens research collaborative filtering system , 1998, CSCW '98.

[7]  Weiguo Fan,et al.  Personalization of search engine services for effective retrieval and knowledge management , 2000, ICIS.

[8]  Prabhakar Raghavan,et al.  Mining the Link Structure of the World Wide Web , 1998 .

[9]  Bracha Shapira,et al.  Study of the usefulness of known and new implicit indicators and their optimal combination for accurate inference of users interests , 2006, SAC.

[10]  Douglas W. Oard,et al.  Modeling Information Content Using Observable Behavior , 2001 .

[11]  Ryen W. White,et al.  The Use of Implicit Evidence for Relevance Feedback in Web Retrieval , 2002, ECIR.

[12]  Mark Claypool,et al.  Implicit interest indicators , 2001, IUI '01.

[13]  Ricardo A. Baeza-Yates,et al.  Web page ranking using link attributes , 2004, WWW Alt. '04.

[14]  Ivan Bratko,et al.  Machine Learning and Data Mining; Methods and Applications , 1998 .

[15]  Konstantinos A. Meintanis,et al.  Recognizing user interest and document value from reading and organizing activities in document triage , 2006, IUI '06.

[16]  Hiroyuki Morikawa,et al.  Vineyard: a collaborative filtering service platform in distributed environment , 2004, 2004 International Symposium on Applications and the Internet Workshops. 2004 Workshops..

[17]  Kate Ehrlich,et al.  Pointing the way: active collaborative filtering , 1995, CHI '95.

[18]  Philip K. Chan,et al.  Implicit Indicators for Interesting Web Pages , 2005, WEBIST.

[19]  Ian Witten,et al.  Data Mining , 2000 .

[20]  Taher H. Haveliwala,et al.  Adaptive methods for the computation of PageRank , 2004 .

[21]  Dell Zhang,et al.  An efficient algorithm to rank Web resources , 2000, Comput. Networks.

[22]  Ingmar Weber,et al.  An Analysis of Factors Used in Search Engine Ranking , 2005, AIRWeb.

[23]  Jude W. Shavlik,et al.  Learning users' interests by unobtrusively observing their normal behavior , 2000, IUI '00.

[24]  Taher H. Haveliwala Topic-Sensitive PageRank: A Context-Sensitive Ranking Algorithm for Web Search , 2003, IEEE Trans. Knowl. Data Eng..

[25]  Susan T. Dumais,et al.  Improving Web Search Ranking by Incorporating User Behavior Information , 2019, SIGIR Forum.

[26]  Yoichi Shinoda,et al.  Information filtering based on user behavior analysis and best match text retrieval , 1994, SIGIR '94.

[27]  Bradley N. Miller,et al.  GroupLens: applying collaborative filtering to Usenet news , 1997, CACM.

[28]  Paul Resnick,et al.  Recommender systems , 1997, CACM.

[29]  Michael J. Pazzani,et al.  A Framework for Collaborative, Content-Based and Demographic Filtering , 1999, Artificial Intelligence Review.

[30]  Stuart E. Middleton,et al.  Capturing interest through inference and visualization: ontological user profiling in recommender systems , 2003, K-CAP '03.

[31]  Sergey Brin,et al.  The Anatomy of a Large-Scale Hypertextual Web Search Engine , 1998, Comput. Networks.

[32]  Douglas W. Oard,et al.  Implicit Feedback for Recommender Systems , 1998 .

[33]  Jonathan Grudin,et al.  Groupware and social dynamics: eight challenges for developers , 1994, CACM.

[34]  Jan-Ming Ho,et al.  ACIRD: Intelligent Internet Document Organization and Retrieval , 2002, IEEE Trans. Knowl. Data Eng..

[35]  Robert M. Losee,et al.  Measuring search-engine quality and query difficulty: ranking with Target and Freestyle , 1999 .