Concept-based information access

Concept-based access to information promises important benefits over keyword-based access. One of these benefits is the ability to take advantage of semantic relationships among concepts in finding relevant documents. Another benefit is the elimination of irrelevant documents by identifying conceptual mismatches. Concepts are mental structures. Words and phrases are the linguistic representatives of concepts. Due to the inherent conciseness of natural language, words can represent multiple concepts and different words may represent the same or very similar concepts. Word sense disambiguation attempts to resolve this ambiguity using contextual information. The use of an ontology facilitates identification of related concepts and their linguistic representatives given a key concept. Latent semantic analysis, on the other hand, attempts to reveal the hidden conceptual relationships among words and phrases based on linguistic usage patterns. In this work we explore the potential of concept-based information access via these two methods. We examine under what circumstances concept-based access becomes feasible and improves user experience.

[1]  Richard A. Harshman,et al.  Indexing by Latent Semantic Analysis , 1990, J. Am. Soc. Inf. Sci..

[2]  Hinrich Schütze,et al.  Automatic Word Sense Discrimination , 1998, Comput. Linguistics.

[3]  Y. Aslandogan,et al.  Concept Based Information Access Using Ontologies and Latent Semantic Analysis , 2004 .

[4]  Andrés Montoyo,et al.  Word sense disambiguation with specification marks in unrestricted texts , 2000, Proceedings 11th International Workshop on Database and Expert Systems Applications.

[5]  Nancy Ide,et al.  Introduction to the Special Issue on Word Sense Disambiguation: The State of the Art , 1998, Comput. Linguistics.

[6]  Ellen M. Voorhees,et al.  Using WordNet to disambiguate word senses for text retrieval , 1993, SIGIR.

[7]  Mark P. Sinka,et al.  A Large Benchmark Dataset for Web Document Clustering , 2002 .

[8]  Ted Pedersen,et al.  Complementarity of lexical and simple syntactic features: The SyntaLex approach to Senseval-3 , 2004, SENSEVAL@ACL.

[9]  Alan F. Smeaton,et al.  Experiments on using semantic distances between words in image caption retrieval , 1996, SIGIR '96.

[10]  Carlo Strapparava,et al.  Experiments in Word Domain Disambiguation for Parallel Texts , 2000, ACL 2000.

[11]  Yuan Baozong,et al.  A D-S based multi-channel information fusion method using classifier's uncertainty measurement , 2000, WCC 2000 - ICSP 2000. 2000 5th International Conference on Signal Processing Proceedings. 16th World Computer Congress 2000.

[12]  Julio Gonzalo,et al.  Indexing with WordNet synsets can improve text retrieval , 1998, WordNet@ACL/COLING.

[13]  Mark Sanderson,et al.  Universities of Leeds, Sheffield and York http://eprints.whiterose.ac.uk/ , 2022 .

[14]  George A. Miller,et al.  Introduction to WordNet: An On-line Lexical Database , 1990 .

[15]  Paul Douglas,et al.  Proceedings International Conference on Information Technology: Coding and Computing , 2002, Proceedings. International Conference on Information Technology: Coding and Computing.

[16]  T. Landauer,et al.  Indexing by Latent Semantic Analysis , 1990 .

[17]  Ted Pedersen,et al.  WordNet::Similarity - Measuring the Relatedness of Concepts , 2004, NAACL.

[18]  Y. Alp Aslandogan,et al.  The 3C architecture: an XML topic maps-based framework for integrating content, context and common knowledge about multimedia , 2003, Proceedings Fifth IEEE Workshop on Mobile Computing Systems and Applications.

[19]  Andrés Montoyo,et al.  Combining Supervised-Unsupervised Methods for Word Sense Disambiguation , 2002, CICLing.

[20]  Bernardo Magnini,et al.  Integrating Subject Field Codes into WordNet , 2000, LREC.

[21]  David Yarowsky,et al.  Modeling Consensus: Classifier Combination for Word Sense Disambiguation , 2002, EMNLP.