Achieving High Recall and Precision with HTLM Documents: An Innovation Approach in Information Retrieval

Information retrieval techniques become a challenge to researchers due to huge growth of digital and electronic information. Researchers are attending this area by developing different techniques to enhance precision and recall of retrieved documents. This paper presents an information retrieval system that has promising results in terms of recall and precision. These results are achieved via developing an improved inverted index for the document set and by developing an enhanced evaluation function to evaluate the retrieved documents in response to user query. Results are compared with two well known techniques applied in IR domain which are Okapi-BM25 and Bayesian interface network model and show that precision and recall of the retrieved documents by the proposed method outperforms these two techniques.

[1]  Byoung-Tak Zhang,et al.  Genetic Mining of HTML Structures for Effective Web-Document Retrieval , 2003, Applied Intelligence.

[2]  Dana Vrajitoru,et al.  Large Population or Many Generations for Genetic Algorithms? Implications in Information Retrieval , 2000 .

[3]  Stephen E. Robertson,et al.  Okapi at TREC-4 , 1995, TREC.

[4]  Harris Wu,et al.  The effects of fitness functions on genetic programming-based ranking discovery forWeb search , 2004, J. Assoc. Inf. Sci. Technol..

[5]  Michael Persin,et al.  Document filtering for fast ranking , 1994, SIGIR '94.

[6]  Byoung-Tak Zhang,et al.  Web-Document Retrieval by Genetic Learning of Importance Factors for HTML Tags , 2000, PRICAI Workshop on Text and Web Mining.

[7]  Ronan Cummins,et al.  Evolving local and global weighting schemes in information retrieval , 2006, Information Retrieval.

[8]  Vicente P. Guerrero-Bote,et al.  Genetic algorithms in relevance feedback: a second test and new contributions , 2003, Inf. Process. Manag..

[9]  Stephen E. Robertson,et al.  Okapi at TREC-3 , 1994, TREC.

[10]  Weiguo Fan,et al.  Effective information retrieval using genetic algorithms based matching functions adaptation , 2000, Proceedings of the 33rd Annual Hawaii International Conference on System Sciences.

[11]  William F. Punch,et al.  Using Genetic Algorithms for Data Mining Optimization in an Educational Web-Based System , 2003, GECCO.

[12]  Gerard Salton,et al.  Term-Weighting Approaches in Automatic Text Retrieval , 1988, Inf. Process. Manag..

[13]  Wei Song,et al.  Genetic algorithm for text clustering based on latent semantic indexing , 2009, Comput. Math. Appl..

[14]  A. A. Aly APPLYING GENETIC ALGORITHM IN QUERY IMPROVEMENT PROBLEM , 2007 .

[15]  Keishi Tajima,et al.  Improving Web Retrieval Precision Based on Semantic Relationships and Proximity of Query Keywords , 2006, DEXA.

[16]  Ammar Al-Dallal,et al.  Genetic Algorithm Based to Improve HTML Document Retrieval , 2009, 2009 Second International Conference on Developments in eSystems Engineering.

[17]  Ola Knutsson,et al.  Improving Precision in Information Retrieval for Swedish using Stemming , 2001, NODALIDA.

[18]  Pasi Fränti,et al.  Web Data Mining , 2009, Encyclopedia of Database Systems.