On the efficient evaluation of relaxed queries in biological databases

In this paper, a new technique is developed to support the query relaxation in biological databases. Query relaxation is required due to the fact that queries tend not to be expressed exactly by the users, especially in scientific databases such as biological databases, in which complex domain knowledge is heavily involved. To treat this problem, we propose the concept of the so-called fuzzy equivalence classes to capture important kinds of domain knowledge that is used to relax queries. This concept is further integrated with the canonical techniques for pattern searching such as the position tree and automaton theory. As a result, fuzzy queries produced through relaxation can be efficiently evaluated. This method has been successfully utilized in a practical biological database - the GPCRDB.

[1]  James B. Morris Formal Languages and their Relation to Automata , 1970 .

[2]  Gaston H. Gonnet,et al.  New Indices for Text: Pat Trees and Pat Arrays , 1992, Information Retrieval: Data Structures & Algorithms.

[3]  Chris Buckley,et al.  Improving automatic query expansion , 1998, SIGIR '98.

[4]  Efthimis N. Efthimiadis,et al.  Interactive query expansion: A user-based evaluation in a relevance feedback environment , 2000, J. Am. Soc. Inf. Sci..

[5]  Timo Niemi,et al.  A deductive data model for query expansion , 1996, SIGIR '96.

[6]  Filippo Mignosi,et al.  Generalizations of the Periodicity Theorem of Fine and Wilf , 1994, CAAP.

[7]  Siegfried Gottwald,et al.  Fuzzy sets and fuzzy logic: the foundations of application - from a mathematical point of view , 1993, Artificial intelligence.

[8]  Hugh E. Williams,et al.  Indexing Nucleotide Databases for Fast Query Evaluation , 1996, EDBT.

[9]  Kenneth H. Fasman,et al.  The GDB human genome data base anno 1993 , 1993, Nucleic Acids Res..

[10]  Donald E. Knuth,et al.  The art of computer programming: sorting and searching (volume 3) , 1973 .

[11]  Sung-Hyuk Kim,et al.  A three-level user interface to multimedia digital libraries with relaxation and restriction , 1998, Proceedings IEEE International Forum on Research and Technology Advances in Digital Libraries -ADL'98-.

[12]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[13]  Robert E. Tarjan,et al.  Depth-First Search and Linear Graph Algorithms , 1972, SIAM J. Comput..

[14]  Esko Ukkonen,et al.  Constructing Suffix Trees On-Line in Linear Time , 1992, IFIP Congress.

[15]  Jaana Kekäläinen,et al.  The impact of query structure and query expansion on retrieval performance , 1998, SIGIR '98.

[16]  Donald R. Morrison,et al.  PATRICIA—Practical Algorithm To Retrieve Information Coded in Alphanumeric , 1968, J. ACM.

[17]  Alfred V. Aho,et al.  The Design and Analysis of Computer Algorithms , 1974 .

[18]  N W Matheson,et al.  The GDB Human Genome Data Base Anno 1992. , 1992, Nucleic acids research.

[19]  Donald E. Knuth,et al.  The art of computer programming, volume 3: (2nd ed.) sorting and searching , 1998 .

[20]  Jeffrey D. Ullman,et al.  Formal languages and their relation to automata , 1969, Addison-Wesley series in computer science and information processing.

[21]  Baldomero Oliva,et al.  TranScout: prediction of gene expression regulatory proteins from their sequences , 2002, Bioinform..

[22]  Edward M. McCreight,et al.  A Space-Economical Suffix Tree Construction Algorithm , 1976, JACM.

[23]  Robert E. Tarjan,et al.  Depth-First Search and Linear Graph Algorithms (Working Paper) , 1971, SWAT.

[24]  Donald E. Knuth,et al.  The Art of Computer Programming: Volume 3: Sorting and Searching , 1998 .

[25]  Eugene W. Myers,et al.  Suffix arrays: a new method for on-line string searches , 1993, SODA '90.

[26]  Efthimis N. Efthimiadis,et al.  Interactive query expansion: A user-based evaluation in a relevance feedback environment , 2000, J. Am. Soc. Inf. Sci..

[27]  Peter Weiner,et al.  Linear Pattern Matching Algorithms , 1973, SWAT.

[28]  Stefano Mizzaro,et al.  Evaluating user interfaces to information retrieval systems: a case study on user support , 1996, SIGIR '96.

[29]  Wojciech Rytter,et al.  Text Algorithms , 1994 .