A Bayesian Network Model for Information Retrieval from Greek Texts

The present paper describes a Bayesian network approach to information retrieval (IR) from natural language texts in Greek. The network structure provides an intuitive representation of uncertainty relationships and the embedded conditional probability table is used by inference algorithms in an attempt to identify documents that are relevant to the user's needs, expressed in the form of Boolean queries. Our research has been directed in constructing a probabilistic IR framework that focus on assisting users perform ad-hoc retrieval of Greek documents from the domain of economics. Furthermore, users can integrate feedback regarding the relevance of the retrieved documents in an attempt to improve performance on upcoming requests. Towards these goals, we have developed the Bayesian network IR system and tested it on several web corpora with different application domains. We have developed two different approaches with regard to the structure: a simple one, where the structure is manually provided, and an automated one, where data mining is used in order to extract the network's structure. Results have depicted satisfactory performance in terms of precision-recall curves.

[1]  Abdul Sattar,et al.  Extending Dual Arc Consistency , 2002, Int. J. Pattern Recognit. Artif. Intell..

[2]  Dario Lucarella,et al.  Information Retrieval from Hypertext: An Approach Using Plausible Inference , 1993, Inf. Process. Manag..

[3]  William S. Cooper,et al.  A definition of relevance for information retrieval , 1971, Inf. Storage Retr..

[4]  Philippe Jégou,et al.  A filtering process for general constraint-satisfaction problems: achieving pairwise-consistency using an associated binary representation , 1989, [Proceedings 1989] IEEE International Workshop on Tools for Artificial Intelligence.

[5]  James P. Callan,et al.  Document filtering with inference networks , 1996, SIGIR '96.

[6]  Paul R. Cohen,et al.  Information retrieval by constrained spreading activation in semantic networks , 1987, Inf. Process. Manag..

[7]  Nir Friedman,et al.  Bayesian Network Classifiers , 1997, Machine Learning.

[8]  W. Bruce Croft,et al.  Relevance feedback and inference networks , 1993, SIGIR.

[9]  Chris Buckley,et al.  Implementation of the SMART Information Retrieval System , 1985 .

[10]  Dongwook Shin,et al.  Hypertext construction using statistical and semantic similarity , 1997, DL '97.

[11]  Susan T. Dumais,et al.  Inductive learning algorithms and representations for text categorization , 1998, CIKM '98.

[12]  Philippe Jégou,et al.  On the Consistency of General Constraint-Satisfaction Problems , 1993, AAAI.

[13]  Marc Gyssens On the complexity of join dependencies , 1986, TODS.

[14]  Eugene C. Freuder,et al.  Neighborhood Inverse Consistency Preprocessing , 1996, AAAI/IAAI, Vol. 1.

[15]  W. Bruce Croft,et al.  A retrieval model incorporating hypertext links , 1989, Hypertext.

[16]  David Maxwell Chickering,et al.  Learning Bayesian networks: The combination of knowledge and statistical data , 1995, Mach. Learn..

[17]  Eugene C. Freuder A sufficient condition for backtrack-bounded search , 1985, JACM.

[18]  Christian Bessiere,et al.  A Generic Customizable Framework for Inverse Local Consistency , 1999, AAAI/IAAI.

[19]  Roland H. C. Yap,et al.  An optimal coarse-grained arc consistency algorithm , 2005, Artif. Intell..

[20]  Luis M. de Campos,et al.  An information retrieval model based on simple Bayesian networks , 2003, Int. J. Intell. Syst..

[21]  Peter van Beek,et al.  On the minimality and global consistency of row-convex constraint networks , 1995, JACM.

[22]  Gregory F. Cooper,et al.  The Computational Complexity of Probabilistic Inference Using Bayesian Belief Networks , 1990, Artif. Intell..