Web mining

The Web grows and evolves faster than we would like and expect, imposing scalability and relevance problems to Web search engines. There are three main data types in the Web: content (text, multimedia), structure (links that form a graph) and Web usage (transactions from Web logs). We emphasize the last type of data, in particular a new subfield called query mining. Server logs of search engines store traces of queries submitted by users, which include queries themselves along with Web pages selected in their answers. Query mining is based in the fact that user queries in search engines and Web sites give valuable information on the interests of people. In addition, clicks after queries relate those interests to actual content. The framework is based on a new vectorial representation of query traces which allows to treat them similarly to documents in traditional information retrieval systems. Also, we consider the problem of reducing the bias in the selections caused by the particular answer rankings computed by the search engine. We show the application of the clustering framework to two problems: relevance ranking boosting and query recommendation. Finally, we show with experiments the effectiveness of our approach. The same ideas can be applied to advertising campaigns in search engines and the automatic generation of a pseudo-ontology for queries.