Keyword-based approaches to improve internet search

Technology keeps on evolving and so must the science of information retrieval. This thesis presents keyword-based approaches to improve information retrieval from the Internet. Focused and unfocused queries to search engines are considered, and means of obtaining relevant documents are presented. For focused queries, techniques are provided to obtain a high precision score from the hit documents; these documents do contain the exact answers to the focused query, which is usually a question. User queries are subjected to ambiguity test to determine if it is ambiguous, and if it is so, provide direction so as the user's intended meaning is the one that is actually searched. The queries are modified to form a new clear and unambiguous. Query is sent to several search engines at the same time, and hit documents from each of these search engines are collated and merged. Hit documents to an ambiguous query are analyzed and ranked based on their actual relevance to the query. Term frequency is used, along with popularity score, to determine the total score of a relevant document. Every relevant hit document is classified based on its academic relevance. A few academic categories are considered--(1) Course Notes, (2) Frequently Asked Questions, (3) Research Paper, (4) Technical Report, (5) Thesis, (6) Tutorial, (7) Review, and (8) Research Paper/Technical Report. Once a search is done, a set of relevant documents is presented, along with each document's academic relevance category (if any)