Automatic discovery and ranking of synonyms for search keywords in the web

Search engines are an indispensable part of a web user's life. A vast majority of these web users experience difficulties caused by the keyword-based search engines such as inaccurate results for queries and irrelevant URLs even though the given keyword is present in them. Also, relevant URLs may be lost as they may have the synonym of the keyword and not the original one. This condition is known as the polysemy problem. To alleviate these problems, we propose an algorithm called automatic discovery and ranking of synonyms for search keywords in the web (ADRS). The proposed method generates a list of candidate synonyms for individual keywords by employing the relevance factor of the URLs associated with the synonyms. Then, ranking of these candidate synonyms is done using co-occurrence frequencies and various page count-based measures. One of the major advantages of our algorithm is that it is highly scalable which makes it applicable to online data on the dynamic, domain-independent and unstructured World Wide Web. The experimental results show that the best results are obtained using the proposed algorithm with WebJaccard.

[1]  Cong Wang,et al.  Keyword Extraction Using PageRank on Synonym Networks , 2010, 2010 International Conference on E-Product E-Service and E-Entertainment.

[2]  Zuhair Bandar,et al.  Sentence similarity based on semantic nets and corpus statistics , 2006, IEEE Transactions on Knowledge and Data Engineering.

[3]  Tomohiro Yoshikawa,et al.  A Study on Extraction Method of Synonyms in Specification Documents , 2012, 2012 13th ACIS International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing.

[4]  Alexander Ulanov,et al.  Mining Text Patterns for Synonyms Extraction , 2011, 2011 22nd International Workshop on Database and Expert Systems Applications.

[5]  Takao Miura,et al.  Automatic Extraction of Synonyms Based on Statistical Machine Translation , 2010, 2010 22nd IEEE International Conference on Tools with Artificial Intelligence.

[6]  Dmitri V. Kalashnikov,et al.  Web People Search via Connection Analysis , 2008, IEEE Transactions on Knowledge and Data Engineering.

[7]  Paul M. B. Vitányi,et al.  The Google Similarity Distance , 2004, IEEE Transactions on Knowledge and Data Engineering.

[8]  David McLean,et al.  An Approach for Measuring Semantic Similarity between Words Using Multiple Information Sources , 2003, IEEE Trans. Knowl. Data Eng..

[9]  Stephen J. Green,et al.  Building Hypertext Links By Computing Semantic Similarity , 1999, IEEE Trans. Knowl. Data Eng..

[10]  Koichi Takeuchi Extraction of Verb Synonyms using Co-clustering Approach , 2008, 2008 Second International Symposium on Universal Communication.

[11]  Timothy W. Finin,et al.  Improving Word Similarity by Augmenting PMI with Estimates of Word Polysemy , 2013, IEEE Transactions on Knowledge and Data Engineering.

[12]  Konstantin Avrachenkov,et al.  Using Web Graph Structure for Person Name Disambiguation , 2010, CLEF.

[13]  Martine De Cock,et al.  Fuzzy Ants Clustering for Web People Search , 2009 .

[14]  M. de Rijke,et al.  Personal Name Resolution of Web People Search , 2008 .

[15]  Tossapon Boongoen,et al.  Fuzzy Orders-of-Magnitude-Based Link Analysis for Qualitative Alias Detection , 2012, IEEE Transactions on Knowledge and Data Engineering.

[16]  Tao Cheng,et al.  Entity Synonyms for Structured Web Search , 2012, IEEE Transactions on Knowledge and Data Engineering.

[17]  Jianyong Wang,et al.  GRAPE: A Graph-Based Framework for Disambiguating People Appearances in Web Search , 2009, 2009 Ninth IEEE International Conference on Data Mining.

[18]  M. Harada,et al.  Finding authoritative people from the Web , 2004, Proceedings of the 2004 Joint ACM/IEEE Conference on Digital Libraries, 2004..

[19]  Jörg Tiedemann,et al.  Finding Synonyms Using Automatic Word Alignment and Measures of Distributional Similarity , 2006, ACL.

[20]  Danushka Bollegala,et al.  Automatic Discovery of Personal Name Aliases from the Web , 2011, IEEE Transactions on Knowledge and Data Engineering.

[21]  Alexandros Potamianos,et al.  Unsupervised Semantic Similarity Computation between Terms Using Web Documents , 2010, IEEE Transactions on Knowledge and Data Engineering.

[22]  Ying Li,et al.  Personal name classification in web queries , 2008, WSDM '08.

[23]  Danushka Bollegala,et al.  A Web Search Engine-Based Approach to Measure Semantic Similarity between Words , 2011, IEEE Transactions on Knowledge and Data Engineering.

[24]  Danushka Bollegala,et al.  Minimally Supervised Novel Relation Extraction Using a Latent Relational Mapping , 2013, IEEE Transactions on Knowledge and Data Engineering.