A similarity coefficient represents the similarity between two documents, two queries, or one document and one query. The retrieved documents can also be ranked in the order of presumed importance. A similarity coefficient is a function which computes the degree of similarity between a pair of text objects. There are a large number of similarity coefficients proposed in the literature, because the best similarity measure doesn't exist (yet !). In this paper we do a comparative analysis for finding out the most relevant document for the given set of keyword by using three similarity coefficients viz Jaccard, Dice and Cosine coefficients. This we perform using genetic algorithm approach. Due to the randomized nature of genetic algorithm the best fitness value is the average of 10 runs of the same code for a fixed number of iterations.The similarity coefficient for a set of documents retrieved for a given query from Google are find out then average relevancy in terms of fitness values using similarity coefficients is calculated. In this paper we have averaged 10 different generations for each query by running the program 10 times for the fixed value of Probability of Crossover Pc=0.7 and Probability of Mutation Pm=0.01. The same experiment was conducted for 10 queries.
[1]
John H. Holland,et al.
Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to Biology, Control, and Artificial Intelligence
,
1992
.
[2]
David E. Goldberg,et al.
Genetic Algorithms in Search Optimization and Machine Learning
,
1988
.
[3]
Marco Gori,et al.
Focused Crawling Using Context Graphs
,
2000,
VLDB.
[4]
Hans-Peter Kriegel,et al.
Focused Web Crawling: A Generic Framework for Specifying the User Interest and for Adaptive Crawling Strategies
,
2001
.
[5]
Filippo Menczer,et al.
Evaluating topic-driven web crawlers
,
2001,
SIGIR '01.
[6]
Goldberg,et al.
Genetic algorithms
,
1993,
Robust Control Systems with Genetic Algorithms.
[7]
Milad Shokouhi,et al.
Enhancing focused crawling with genetic algorithms
,
2005,
International Conference on Information Technology: Coding and Computing (ITCC'05) - Volume II.
[8]
Soumen Chakrabarti,et al.
Focused Web Crawling
,
2009,
Encyclopedia of Database Systems.