Automatic keyword prediction using Google similarity distance

In this paper, we present a new approach to help users using search engines without entering any keywords. What we want to do is to predict what word the users may want to search before they think about it. Most of the studies done in this field focus on how to help users enter keywords or how to re-rank the search results in order to make them more precise. Both of those methods need to establish a user behavior model and a repository in which to save the logs. In our proposed method, we use the Google similarity distance to measure keywords in the Webpage to find the potential keywords for the users. Thus, we do not need any repository. All the executions are on-line and real-time. Then, we extract all the important keywords as the potential search keywords. In this way, we can use these professional keywords to achieve precise search results. We believe that this can be useful in many areas such as e-learning and can also be used in mobile devices.

[1]  Krishna Bharat SearchPad: explicit capture of search context to support Web search , 2000, Comput. Networks.

[2]  Ehud Rivlin,et al.  Placing search in context: the concept revisited , 2002, TOIS.

[3]  W. Bruce Croft,et al.  Improving the effectiveness of information retrieval with local context analysis , 2000, TOIS.

[4]  Luis Gravano,et al.  Learning to find answers to questions on the Web , 2004, TOIT.

[5]  Tao Meng,et al.  On the peninsula phenomenon in web graph and its implications on web search , 2006 .

[6]  Lee-Feng Chien,et al.  PAT-tree-based keyword extraction for Chinese information retrieval , 1997, SIGIR '97.

[7]  Qiang Yang,et al.  Mining web logs for prediction models in WWW caching and prefetching , 2001, KDD '01.

[8]  J. Weijer,et al.  Word length, sentence length and frequency: Zipf revisited , 2004 .

[9]  Karen Spärck Jones,et al.  Information Retrieval and Artificial Intelligence , 1999, Artif. Intell..

[10]  Chengqi Zhang,et al.  An information filtering model on the Web and its application in JobAgent , 2000, Knowl. Based Syst..

[11]  Ido Dagan,et al.  Mining Text Using Keyword Distributions , 1998, Journal of Intelligent Information Systems.

[12]  Yiyu Yao,et al.  Web Log Mining , 2003 .

[13]  Clement T. Yu,et al.  A theory of term importance in automatic text analysis , 1974, J. Am. Soc. Inf. Sci..

[14]  Mitsuru Ishizuka,et al.  Keyword extraction from a single document using word co-occurrence statistical information , 2004, Int. J. Artif. Intell. Tools.

[15]  Fabrizio Sebastiani,et al.  Machine learning in automated text categorization , 2001, CSUR.

[16]  Ning Zhong,et al.  Representation and Construction of Ontologies for Web Intelligence , 2002, Int. J. Found. Comput. Sci..

[17]  Paul M. B. Vitányi,et al.  The Google Similarity Distance , 2004, IEEE Transactions on Knowledge and Data Engineering.

[18]  Kun Chang Lee,et al.  Fuzzy cognitive map approach to web-mining inference amplification , 2002, Expert Syst. Appl..

[19]  Judit Bar-Ilan,et al.  Methods for comparing rankings of search engine results , 2005, Comput. Networks.

[20]  Mark Levene,et al.  Evaluating Variable-Length Markov Chain Models for Analysis of User Web Navigation Sessions , 2007, IEEE Transactions on Knowledge and Data Engineering.

[21]  Clement T. Yu,et al.  Personalized Web search for improving retrieval effectiveness , 2004, IEEE Transactions on Knowledge and Data Engineering.

[22]  Ilyas Cicekli,et al.  Using lexical chains for keyword extraction , 2007, Inf. Process. Manag..