Retrieval of Relevant Web Pages by a New Filtering Method

With the great mass of the pages managed through the world, and especially with the advent of the web, it has become more difficult to find the relevant pages after an interrogation. Furthermore, the manual filtering of the indexed web pages is a laborious task. A new filtering method of the annotated web pages (by a semantic annotation process) and the non-annotated web pages (retrieved from search engine Google) is then necessary to group the relevant web pages for the user. In this chapter, the authors first synthesize their previous work of the semantic annotation of web pages. Then, they define a new filtering method based on three activities. They also present their querying and filtering component of web pages; their purpose is to demonstrate the feasibility of our filtering method. Finally, the authors present an evaluation of this component, which has proved its performance for multiple domains, and they discuss the use of the extended Boolean retrieval method in the new filtering method.

[1]  Rafik Bouaziz,et al.  Filtering Method for the Annotated and Non-Annotated Web Pages , 2017, Int. J. Knowl. Soc. Res..

[2]  Edward A. Fox,et al.  Research Contributions , 2014 .

[3]  Vassilios Peristeras,et al.  Interlinking the Social Web with Semantics , 2008, IEEE Intelligent Systems.

[4]  Alistair Moffat,et al.  Efficient Extended Boolean Retrieval , 2012, IEEE Transactions on Knowledge and Data Engineering.

[5]  F. Rossi,et al.  Représentation d’un grand réseau à partir d’une classification hiérarchique de ses sommets , 2012 .

[6]  Alistair Moffat,et al.  Extended Boolean retrieval for systematic biomedical reviews , 2010, ACSC.

[7]  Rafik Bouaziz,et al.  Automation and evaluation of the semantic annotation of Web resources , 2013, 8th International Conference for Internet Technology and Secured Transactions (ICITST-2013).

[8]  G. N. Lance,et al.  A General Theory of Classificatory Sorting Strategies: 1. Hierarchical Systems , 1967, Comput. J..

[9]  Rafik Bouaziz,et al.  Fuzzy semantic annotation of Web resources , 2014, 2014 World Symposium on Computer Applications & Research (WSCAR).

[10]  Basavaraj S. Anami,et al.  Machine Learning Techniques in Web Content Mining: A Comparative Analysis , 2014, J. Inf. Knowl. Manag..

[11]  Amir Masoud Rahmani,et al.  Link Processing for Fuzzy Web Pages Clustering and Classification , 2009 .

[12]  Steffen Lohmann,et al.  Adding Semantics to Social Software Engineering: (Re-)Using Ontologies in a Community-oriented Requirements Engineering Environment , 2010, Software Engineering.

[13]  Rafik Bouaziz,et al.  Automation of the semantic annotation of web resources , 2014 .

[14]  Falk Scholer,et al.  The challenge of high recall in biomedical systematic search , 2009, DTMBIO.

[15]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[16]  Gerard Salton,et al.  A vector space model for automatic indexing , 1975, CACM.

[17]  Edward Fox,et al.  Extending the boolean and vector space models of information retrieval with p-norm queries and multiple concept types , 1983 .

[18]  Ramachandra V. Pujeri,et al.  DISTRIBUTED APPROACH to WEB PAGE CATEGORIZATION USING MAP- REDUCE PROGRAMMING MODEL , 2012 .

[19]  Amit Dhurandhar,et al.  Real-time understanding of humanitarian crises via targeted information retrieval , 2017, IBM J. Res. Dev..

[20]  Matthias Samwald,et al.  The bio-zen plus ontology , 2008, Appl. Ontology.

[21]  Banu Diri,et al.  Impact of a New Attribute Extraction Algorithm on Web Page Classification , 2009, DMIN.

[22]  Jon Corson-Rikert,et al.  The VIVO Ontology: Enabling Networking of Scientists , 2011 .

[23]  Philippe Blache,et al.  A semantic vector space and features-based approach for automatic information filtering , 2004, Expert Syst. Appl..