Knowledge Discovery in Web-Directories: Finding Term-Relations to Build a Business Ontology

The Web continues to grow at a tremendous rate. Search engines find it increasingly difficult to provide useful results. To manage this explosively large number of Web documents, automatic clustering of documents and organising them into domain dependent directories became very popular. In most cases, these directories represent a hierarchical structure of categories and sub-categories for domains and sub-domains. To fill up these directories with instances, individual documents are automatically analysed and placed into them according to their relevance. Though individual documents in these collections may not be ranked efficiently, combinedly they provide an excellent knowledge source for facilitating ontology construction in that domain. In (mainly automatic) ontology construction steps, we need to find and use relevant knowledge for a particular subject or term. News documents provide excellent relevant and up-to-date knowledge source. In this paper, we focus our attention in building business ontologies. To do that we use news documents from business domains to get an up-to-date knowledge about a particular company. To extract this knowledge in the form of important “terms” related to the company, we apply a novel method to find “related terms” given the company name. We show by examples that our technique can be successfully used to find “related terms” in similar cases.

[1]  Efthimis N. Efthimiadis,et al.  A user-centred evaluation of ranking algorithms for interactive query expansion , 1993, SIGIR.

[2]  Alan F. Smeaton,et al.  The Retrieval Effects of Query Expansion on a Feedback Document Retrieval System , 1983, Comput. J..

[3]  Ricardo Baeza-Yates,et al.  Information Retrieval: Data Structures and Algorithms , 1992 .

[4]  Sanjiv K. Bhatia,et al.  Selection of search terms based on user profile , 1992, SAC '92.

[5]  Carole D. Hafner,et al.  The State of the Art in Ontology Design: A Survey and Comparative Review , 1997, AI Mag..

[6]  Gerard Salton,et al.  A comparison of search term weighting: term relevance vs. inverse document frequency , 1981, SIGIR '81.

[7]  Aggelos Kiayias,et al.  Polynomial Reconstruction Based Cryptography , 2001, Selected Areas in Cryptography.

[8]  Peter Willett,et al.  The Limitations of Term Co-Occurrence Data for Query Expansion in Document Retrieval Systems , 1991 .

[9]  Gerard Salton,et al.  Improving retrieval performance by relevance feedback , 1997, J. Am. Soc. Inf. Sci..

[10]  Sandip Debnath,et al.  Identifying Content Blocks from Web Documents , 2005, ISMIS.

[11]  G. Kelly,et al.  Clinical Psychology and Personality: The Selected Papers of George Kelly , 1969 .

[12]  Robert M. Fung,et al.  Applying Bayesian networks to information retrieval , 1995, CACM.

[13]  Berthier A. Ribeiro-Neto,et al.  A belief network model for IR , 1996, SIGIR '96.

[14]  Gerard Salton,et al.  A comparison of search term weighting: term relevance vs. inverse document frequency , 1981, SIGIR 1981.

[15]  Norbert Fuhr,et al.  Searching Structured Documents with the Enhanced Retrieval Functionality of freeWAIS-sf and SFgate , 1995, Comput. Networks ISDN Syst..

[16]  Gerard Salton,et al.  Automatic Information Organization And Retrieval , 1968 .

[17]  Shusaku Tsumoto,et al.  Foundations of Intelligent Systems, 15th International Symposium, ISMIS 2005, Saratoga Springs, NY, USA, May 25-28, 2005, Proceedings , 2005, ISMIS.

[18]  Donna K. Harman,et al.  Relevance Feedback and Other Query Modification Techniques , 1992, Information retrieval (Boston).

[19]  Donna K. Harman,et al.  Relevance feedback revisited , 1992, SIGIR '92.

[20]  Sandip Debnath,et al.  Automatic identification of informative sections of Web pages , 2005, IEEE Transactions on Knowledge and Data Engineering.

[21]  W. Bruce Croft,et al.  Relevance feedback and inference networks , 1993, SIGIR.

[22]  W. Bruce Croft,et al.  Inference networks for document retrieval , 1989, SIGIR '90.

[23]  Sandip Debnath,et al.  Automatic extraction of informative blocks from webpages , 2005, SAC '05.

[24]  Rolf Haenni,et al.  Modeling Information Retrieval with Probabilistic Argumentation Systems , 1998, BCS-IRSG Annual Colloquium on IR Research.

[25]  Donna K. Harman,et al.  Ranking Algorithms , 1992, Information Retrieval: Data Structures & Algorithms.

[26]  Judea Pearl,et al.  Probabilistic reasoning in intelligent systems - networks of plausible inference , 1991, Morgan Kaufmann series in representation and reasoning.

[27]  James Allan,et al.  Capturing Term Dependencies using a Sentence Tree based Language Model , 2002 .