A Multi-Threaded Semantic Focused Crawler

The Web comprises of voluminous rich learning content. The volume of ever growing learning resources however leads to the problem of information overload. A large number of irrelevant search results generated from search engines based on keyword matching techniques further augment the problem. A learner in such a scenario needs semantically matched learning resources as the search results. Keeping in view the volume of content and significance of semantic knowledge, our paper proposes a multi-threaded semantic focused crawler (SFC) specially designed and implemented to crawl on the WWW for educational learning content. The proposed SFC utilizes domain ontology to expand a topic term and a set of seed URLs to initiate the crawl. The results obtained by multiple iterations of the crawl on various topics are shown and compared with the results obtained by executing an open source crawler on the similar dataset. The results are evaluated using Semantic Similarity, a vector space model based metric, and the harvest ratio.

[1]  Karen Coyle Chapter 2: Semantic Web and Linked Data , 2012 .

[2]  Mazeiar Salehie,et al.  Analysis of priority and partitioning effects on web crawling performance , 2004, Intelligent Information Systems.

[3]  Hema Banati,et al.  Ranking Tagged Resources Using Social Semantic Relevance , 2011, Int. J. Inf. Retr. Res..

[4]  G Stix,et al.  The mice that warred. , 2001, Scientific American.

[5]  Steffen Staab,et al.  Ontology Learning , 2004, Encyclopedia of Machine Learning and Data Mining.

[6]  Arputharaj Kannan,et al.  LSCrawler: A Framework for an Enhanced Focused Web Crawler Based on Link Semantics , 2006, 2006 IEEE/WIC/ACM International Conference on Web Intelligence (WI 2006 Main Conference Proceedings)(WI'06).

[7]  Filippo Menczer,et al.  Evaluating topic-driven web crawlers , 2001, SIGIR '01.

[8]  James A. Hendler,et al.  The Semantic Web" in Scientific American , 2001 .

[9]  Timothy W. Finin,et al.  Swoogle: a search and metadata engine for the semantic web , 2004, CIKM '04.

[10]  Jon Kleinberg,et al.  The Structure of the Web , 2001, Science.

[11]  Harith Alani,et al.  Ranking Ontologies with AKTiveRank , 2006, SEMWEB.

[12]  Hong-Gee Kim,et al.  Learnable Focused Crawling Based on Ontology , 2008, AIRS.

[13]  Babak Bagheri Hariri,et al.  A Method for Focused Crawling Using Combination of Link Structure and Content Similarity , 2006, 2006 IEEE/WIC/ACM International Conference on Web Intelligence (WI 2006 Main Conference Proceedings)(WI'06).

[14]  Euripides G. M. Petrakis,et al.  Information Retrieval by Semantic Similarity , 2006, Int. J. Semantic Web Inf. Syst..

[15]  Debashis Hati,et al.  An Approach for Identifying URLs Based on Division Score and Link Score in Focused Crawler , 2010 .

[16]  Euripides G. M. Petrakis,et al.  Improving the performance of focused web crawlers , 2009, Data Knowl. Eng..

[17]  Roberto Navigli,et al.  An analysis of ontology-based query expansion strategies , 2003 .

[18]  Stephen E. Robertson,et al.  Effective site finding using link anchor information , 2001, SIGIR '01.

[19]  D. M. Hutton,et al.  Web Dynamics - Adapting to Change in Content, Size, Topology and Use , 2006 .

[20]  Hema Banati,et al.  FCHC: A Social Semantic Focused Crawler , 2011, ACC.

[21]  Ryan Singh Paul,et al.  A Review of “Learning, Creating, and Using Knowledge: Concept Maps as Facilitative Tools in Schools and Corporation” , 2012, Inf. Soc..

[22]  Hai Dong,et al.  Focused Crawling for Automatic Service Discovery, Annotation, and Classification in Industrial Digital Ecosystems , 2011, IEEE Transactions on Industrial Electronics.

[23]  Antoine Isaac,et al.  SKOS Simple Knowledge Organization System Primer , 2009 .

[24]  Gerard Salton,et al.  Term-Weighting Approaches in Automatic Text Retrieval , 1988, Inf. Process. Manag..

[25]  Hema Banati,et al.  Social Semantic Retrieval and Ranking of eResources , 2010, 2010 International Conference on Advances in Recent Technologies in Communication and Computing.

[26]  Rajeev Motwani,et al.  The PageRank Citation Ranking : Bringing Order to the Web , 1999, WWW 1999.

[27]  Iraklis Varlamis,et al.  THESUS: Organizing Web document collections based on link semantics , 2003, The VLDB Journal.

[28]  Hema Banati,et al.  Architecture to Organize Social Semantic Relevant Web Resources in a Knowledgebase , 2011 .

[29]  Carlos Castillo,et al.  Effective web crawling , 2005, SIGF.

[30]  G. Aghila,et al.  Ontology-based Web crawler , 2004, International Conference on Information Technology: Coding and Computing, 2004. Proceedings. ITCC 2004..

[31]  Yugyung Lee,et al.  OntoKhoj: a semantic web portal for ontology searching, ranking and classification , 2003, WIDM '03.

[32]  Filippo Menczer,et al.  Crawling the Web , 2004, Web Dynamics.

[33]  Mirna Willer,et al.  Semantic web and linked open data , 2013 .

[34]  Marc Ehrig,et al.  Ontology-focused crawling of Web documents , 2003, SAC '03.

[35]  Gerd Stumme,et al.  Semantic resource management for the web: an e-learning application , 2004, WWW Alt. '04.

[36]  Elizabeth Chang,et al.  A survey in semantic web technologies-inspired focused crawlers , 2008, 2008 Third International Conference on Digital Information Management.

[37]  Martin van den Berg,et al.  Focused Crawling: A New Approach to Topic-Specific Web Resource Discovery , 1999, Comput. Networks.

[38]  Elizabeth Chang,et al.  A context‐aware semantic similarity model for ontology environments , 2011, Concurr. Comput. Pract. Exp..

[39]  Asunción Gómez-Pérez,et al.  ONTOMETRIC: A Method to Choose the Appropriate Ontology , 2004, J. Database Manag..

[40]  James A. Hendler,et al.  From the Semantic Web to social machines: A research challenge for AI on the World Wide Web , 2010, Artif. Intell..

[41]  Philipp Cimiano,et al.  Ontology learning and population from text - algorithms, evaluation and applications , 2006 .

[42]  Hema Banati,et al.  Use of Ontology for Reusing Web Repositories for eLearning , 2010 .

[43]  Marco Gori,et al.  Focused Crawling Using Context Graphs , 2000, VLDB.

[44]  Elizabeth Chang,et al.  State of the Art in Semantic Focused Crawlers , 2009, ICCSA.