Spotting Ontological Lacunae through Spectrum Analysis of Retrieved Documents

The world-wide-web has produced a potentially unlimited flow of information of varying quality, the so-called ”information overload”. Information filters have been designed in hopes of letting only the relevant part be delivered to the user. Such filters require that the user formulate a more or less stable information need. Often, however, an ‘anomalous state of knowledge’ occurs: the user needs information but is not aware of this need. With the rapid growth of information available on-line, it is likely that this need could be filled, if only there were a system to detect it on the user’s behalf. As part of our project involving ontology based proactive information filtering (PROFILE), we designed such a system. It consists of: (1) an ontology in which a user’s more or less stable information need is represented, (2) WordNet, to translate concepts in the ontology into a ‘spectrum’ of keywords, and (3) a component to analyze the spectrum of a set of retrieved documents (e.g. through latent semantic indexing). The two spectra are compared, and if a mismatch is found, the latter spectrum is fed back through WordNet to the ontology component. We show an example of how this mechanism leads to the detection of an anomalous state of knowledge, resulting in a new concept to augment the ontology.

[1]  Kevin Knight,et al.  Toward Distributed Use of Large-Scale Ontologies t , 1997 .

[2]  Cornelis H. A. Koster,et al.  Profile - A Proactive Information Filter , 1996 .

[3]  W. Bruce Croft Effective Text Retrieval Based on Combining Evidence from the Corpus and Users , 1995, IEEE Expert.

[4]  Richard Fikes,et al.  Collaborative Ontology Construction for Information Integration , 2007 .

[5]  George A. Miller,et al.  WordNet: A Lexical Database for English , 1995, HLT.

[6]  C. Eckart,et al.  The approximation of one matrix by another of lower rank , 1936 .

[7]  Vibhu O. Mittal,et al.  On the Use of Linguistic Ontologies for Accessing and Indexing Distributed Digital Libraries , 1994 .

[8]  Douglas B. Lenat,et al.  CYC: a large-scale investment in knowledge infrastructure , 1995, CACM.

[9]  T. Landauer,et al.  A Solution to Plato's Problem: The Latent Semantic Analysis Theory of Acquisition, Induction, and Representation of Knowledge. , 1997 .

[10]  Wayne D. Gray,et al.  Basic objects in natural categories , 1976, Cognitive Psychology.

[11]  S. Nazim Ali Subject relationship between articles determined by co-occurrence of keywords in citing and cited titles , 1993, J. Inf. Sci..

[12]  Gerard Salton,et al.  Automatic Text Processing: The Transformation, Analysis, and Retrieval of Information by Computer , 1989 .

[13]  Eduard Hoenkamp,et al.  An Analysis of Psychological Experiments on Non-Monotonic Reasoning , 1987, IJCAI.

[14]  Nicola Guarino,et al.  Using a Large Linguistic Ontology for Internet-based Retrieval of Object-Oriented Components , 1997 .

[15]  Eva Hajicová,et al.  A linguistic approach to information retrieval - I , 1974, Inf. Storage Retr..

[16]  Timothy W. Finin,et al.  Enabling Technology for Knowledge Sharing , 1991, AI Mag..

[17]  Susan T. Dumais,et al.  The vocabulary problem in human-system communication , 1987, CACM.

[18]  Nicholas J. Belkin,et al.  Using structural representation of anomalous states of knowledge for choosing document retrieval strategies , 1986, SIGIR '86.

[19]  Harold Borko,et al.  Automatic Document Classification , 1963, JACM.

[20]  Toshio Yokoi The EDR Electronic Dictionary and Its Role in Information Science. , 1995 .

[21]  P ? ? ? ? ? ? ? % ? ? ? ? , 1991 .