Semantic Retrieval Approach for Web Documents

Because of explosive growth of resources in the internet, the information retrieval technology has become particularly important. However the current retrieval methods are essentially based on the full text matching of keywords approach lacking of semantic information and can't understand the user's query intent very well. These methods return a large number of irrelevant information, and are unable to meet the user's request. Systems have been established so far failed to overcome fully the limitations of search based on keywords. Such systems are built from variations of classic models that represent information by keywords. Using Semantic Web is a way to increase the precision of information retrieval systems. In this paper, we propose the semantic information retrieval approach to extract the information from the web documents in certain domain (jaundice diseases) by collecting the domain relevant documents using focused crawler based on domain ontology, and using similar semantic content that is matched with a given user's query. Semantic retrieval approach aims to discover semantically similar terms in documents and query terms using WordNet.

[1]  Zhongzhi Shi,et al.  Focused Crawling with Heterogeneous Semantic Information , 2008, 2008 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology.

[2]  Hai Dong,et al.  Focused Crawling for Automatic Service Discovery, Annotation, and Classification in Industrial Digital Ecosystems , 2011, IEEE Transactions on Industrial Electronics.

[3]  Qiang Wang,et al.  Ontology Learning Through Focused Crawling and Information Extraction , 2009, 2009 International Conference on Knowledge and Systems Engineering.

[4]  Fabio Gasparetti,et al.  Adaptive Focused Crawling , 2007, The Adaptive Web.

[5]  Prasenjit Mitra,et al.  Clustering-based incremental web crawling , 2010, TOIS.

[6]  Xiaoyue Wang,et al.  Extract Semantic Information from WordNet to Improve Text Classification Performance , 2010, AST/UCMA/ISA/ACN.

[7]  Euripides G. M. Petrakis,et al.  Semantic similarity methods in wordNet and their application to information retrieval on the web , 2005, WIDM '05.

[8]  Sang-Jo Lee,et al.  Ontology-Based Automatic Classification of Web Documents , 2006, ICIC.

[9]  Natalya F. Noy,et al.  The state of art in ontology design , 1997 .

[10]  P. Smith,et al.  A review of ontology based query expansion , 2007, Inf. Process. Manag..

[11]  Carole D. Hafner,et al.  The State of the Art in Ontology Design: A Survey and Comparative Review , 1997, AI Mag..

[12]  Hongsheng Wang,et al.  Research on similarity of Semantic Web , 2010, 2010 International Conference on Computer Application and System Modeling (ICCASM 2010).

[13]  Debakar Shamanta,et al.  Focused web crawling: A framework for crawling of country based financial data , 2010, 2010 2nd IEEE International Conference on Information and Financial Engineering.

[14]  Hong Shao,et al.  Expansion Model of Semantic Query Based on Ontology , 2009, 2009 Second Pacific-Asia Conference on Web Mining and Web-based Application.

[15]  Dan J. Smith,et al.  Word Similarity In WordNet , 2006, HPSC.

[16]  Nicola Guarino,et al.  Ontologies and Knowledge Bases. Towards a Terminological Clarification , 1995 .

[17]  Thomas R. Gruber,et al.  A translation approach to portable ontology specifications , 1993 .

[18]  Arantxa Otegi,et al.  Document Expansion Based on WordNet for Robust IR , 2010, COLING.

[19]  Yun Tian,et al.  Comparison of current semantic similarity methods in WordNet , 2010, 2010 International Conference on Computer Application and System Modeling (ICCASM 2010).

[20]  Enrico Motta,et al.  Semantically enhanced Information Retrieval: An ontology-based approach , 2011, J. Web Semant..

[21]  T.W. Fox Document vector compression and its application in document clustering , 2005, Canadian Conference on Electrical and Computer Engineering, 2005..

[22]  Elizabeth Chang,et al.  A survey in semantic web technologies-inspired focused crawlers , 2008, 2008 Third International Conference on Digital Information Management.

[24]  A. James 2010 , 2011, Philo of Alexandria: an Annotated Bibliography 2007-2016.

[25]  Michel Simonet,et al.  Ontologies in the Health Field , 2009 .

[26]  Michael K. Ng,et al.  Medical Document Clustering Using Ontology-Based Term Similarity Measures , 2008, Int. J. Data Warehous. Min..

[27]  Junwei Luo,et al.  Research on Information Retrieval System Based on Semantic Web and Multi-Agent , 2010, 2010 International Conference on Intelligent Computing and Cognitive Informatics.

[28]  David Sánchez,et al.  Using ontologies for structuring organizational knowledge in Home Care assistance , 2010, Int. J. Medical Informatics.

[29]  Biswajit Sahoo,et al.  Adaptive focused crawling based on link analysis , 2010, 2010 2nd International Conference on Education Technology and Computer.

[30]  Euripides G. M. Petrakis,et al.  Information Retrieval by Semantic Similarity , 2006, Int. J. Semantic Web Inf. Syst..

[31]  Li Guo,et al.  Focused Crawling for Retrieving Chemical Information , 2008, Innovations in Hybrid Intelligent Systems.

[32]  G. Aghila,et al.  A Survey of Semantic Similarity Methods for Ontology Based Information Retrieval , 2010, 2010 Second International Conference on Machine Learning and Computing.

[33]  Sang-Jo Lee,et al.  Automatic classification of Web pages based on the concept of domain ontology , 2005, 12th Asia-Pacific Software Engineering Conference (APSEC'05).

[34]  Mohamed S. Kamel,et al.  Enhancing Text Clustering Performance Using Semantic Similarity , 2009, ICEIS.

[35]  Dimitar Kazakov,et al.  WordNet-based text document clustering , 2004 .

[36]  Seong-Bae Park,et al.  Ontology-Based Automatic Classification of Web Pages , 2004, WSC.

[37]  Daxin Liu,et al.  Using Ontologies for Semantic Query Optimization of XML Database , 2006, KDXD.

[38]  Elizabeth Chang,et al.  State of the Art in Semantic Focused Crawlers , 2009, ICCSA.

[39]  Balakrishnan Chandrasekaran,et al.  What are ontologies, and why do we need them? , 1999, IEEE Intell. Syst..

[40]  Ying Liu,et al.  On Document Representation and Term Weights in Text Classification , 2009 .

[41]  Luis Alfonso Ureña López,et al.  Using WordNet in Multimedia Information Retrieval , 2009, CLEF.

[42]  Jan Rauch,et al.  Data Mining and Medical Knowledge Management: Cases and Applications , 2009 .

[43]  Marco A. Casanova,et al.  Semantic Web: Concepts, Technologies and Applications , 2007, NASA Monographs in Systems and Software Engineering.

[44]  Li Zhang,et al.  Text Information Retrieval Based on Concept Semantic Similarity , 2009, 2009 Fifth International Conference on Semantics, Knowledge and Grid.

[45]  Debajyoti Mukhopadhyay,et al.  A New Approach to Design Domain Specific Ontology Based Web Crawler , 2007 .

[46]  Rolf Rannacher,et al.  Modeling, Simulation and Optimization of Complex Processes: Proceedings of the International Conference on High Performance Scientific Computing, March 10-14, 2003, Hanoi, Vietnam , 2005 .

[47]  Manjunath Ramachandra Web-based Supply Chain Management and Digital Signal Processing: Methods for Effective Information Administration and Transmission , 2009 .

[48]  Masoud Makrehchi Query-relevant document representation for text clustering , 2010, 2010 Fifth International Conference on Digital Information Management (ICDIM).

[49]  Mostafa M. Aref,et al.  Fuzzy Document Clustering Approach using WordNet Lexical Categories , 2008, SCSS.

[50]  Qiuyan Sheng,et al.  Measuring Semantic Similarity in Ontology and Its Application in Information Retrieval , 2008, 2008 Congress on Image and Signal Processing.

[51]  Hui Jiang,et al.  Study on Application of Document Representation Model Based on Query and Content Information in Website Search Engine , 2010, 2010 International Conference on Web Information Systems and Mining.

[52]  Michael Healy,et al.  Theory and Applications of Ontology: Computer Applications , 2010 .

[53]  T.F. Gharib,et al.  Web document clustering approach using wordnet lexical categories and fuzzy clustering , 2008, 2008 11th International Conference on Computer and Information Technology.