论文信息 - Induction-based approach to personalized search engines

Induction-based approach to personalized search engines

In a document retrieval system where data is stored and compared with a specific query and then compared with other documents, we need to find the document that is most similar to the query. The most similar document will have the weight higher than other documents. When more than one document are proposed to the user, these documents have to be sorted according to their weights. Once the result is presented to the user by a recommender system, the user may check any document of interest. If there are two different documents' lists, as two proposed results presented by different recommender systems, then, there is a need to find which list is more efficient. To do so, the measuring tool "Search Engine Ranking Efficiency Evaluation Tool [SEREET]" came to existence. This tool assesses the efficiency of each documents list and assigns a numerical value to the list. The value will be closer to 100% if the ranking list efficiency is high which means more relevance documents exist in the list and documents are sorted according to their relevance to the user. The value will be closer to 0% when the ranking list efficiency is poor and all of the presented documents are uninteresting documents to the user. A model to evaluate ranking efficiency is proposed in the dissertation, then it is proved it mathematically. Many mechanisms of search engine have been proposed in order to assess the relevance of a web page. They have focused on keyword frequency, page usage, link analysis and various combinations of them. These methods have been tested and used to provide the user with the most interesting web pages, according to his or her preferences. The collaborative filtering is a new approach, which was developed in this dissertation to retrieve the most interesting documents to the user according to his or her interests. Building a user profile is a very important issue in finding the user interest and categorizes each user in a suitable category. This is a requirement in collaborative filtering implementation. The inference tools such as time spent in a web page, mouse movement, page scrolling, mouse clicks and other tools were investigated. Then the dissertation shows that the most efficient and sufficient tool is the time a user spent on a web page. To eliminate errors, the system introduces a low threshold and high threshold for each user. Once the time spent on a web page breaks this threshold, an error is reported. SEREET tool is one of the contributions to the scientific society, which measures the efficiency of a search engine ranking list. Considerable work were carried, then the conclusion was that the amount of time spent on a web page is the most important factor in determining a user interest of a web page and also it is a sufficient tool which does not require collaborations from other tools such as mouse movements or a page scrolling. The results show that implicit rating is a satisfactory measure and can replace explicit rating. New filtering technique was introduced to design a fully functional recommender system. The linear vector algorithm which was introduced improves the vector space algorithm (VSA) in time complexity and efficiency. The use of machine learning enhances the retrieved list efficiency. Machine learning algorithm uses positive and negative examples for the training, these examples are mandatory to improve the error rate of the system. The result shows that the amount of these examples increases proportionally with the error rate of the system.