Content-Based Document Retrieval Using Natural Language

A system for the content-based querying of large databases containing documents of different classes (texts, images, image sequences etc.) is introduced. Queries are formulated in natural language (NL) and are evaluated for their semantic contents. For the document evaluation, a knowledge model consisting of a set of domain specific concept interpretation methods is constructed. Thus, the semantics of both the query and the documents can be interconnected, i. e. the retrieval process searches for a match on the semantic level (not merely on the level of keywords or global image properties) between the query and the document. Methods from fuzzy set theory are used to find the matches. Furthermore, the retrieval methods associate information from different document classes. To avoid the loss of information inherent to pre-indexing, documents need not be indexed; in principle, every search may be performed on the raw data under a given query. The system can therefore answer every query that can be expressed in the semantic model. To achieve the high data rates necessary for on-line analysis, dedicated VLSI search processors are being developed along with a parallel highthroughput media-server. In the sequel, we outline the system architecture and detail specific aspects of those two modules which together implement natural language search: the natural language interface NatLink, we performs the syntactical analysis and constructs a formal semantical interpretation of the queries, and the subsequent fuzzy retrieval module, which establishes an operational model for concept-based NL interpretation. 1 Parts of the work reported here were funded by the ministry of science and research of the German state of Nordrhein-Westfalen within the collaborative research initiative “Virtual Knowledge Factory”. The group developing the HPQS system includes working groups at the universities of Aachen (T. G. Noll), Bielefeld (A. Knoll), Dortmund (J. Biskup), Hagen (H. Helbig), and Paderborn (B. Monien). HPQS is an acronym for “High Performance Query Server”. 2 This is a revised version of an earlier report (Knoll et al. (1998b)).

[1]  Jianwei Zhang,et al.  Efficient learning of non-uniform B-splines for modelling and control , 1999 .

[2]  J.F.A.K. van Benthem Determiners and logic , 1983 .

[3]  David Jordan,et al.  The Object Database Standard: ODMG 2.0 , 1997 .

[4]  A. Knoll,et al.  A framework for evaluating fusion operators based on the theory of generalized quantifiers , 1999, Proceedings. 1999 IEEE/SICE/RSJ. International Conference on Multisensor Fusion and Integration for Intelligent Systems. MFI'99 (Cat. No.99TH8480).

[5]  Alexander Franz Learning PP attachment from corpus statistics , 1995, Learning for Natural Language Processing.

[6]  S Mehl,et al.  Statistische Verfahren zur Zuordnung von Präpositionalphrasen , 1998 .

[7]  Hermann Helbig,et al.  Realization of a User-friendly Access to Networked Information Retrieval Systems , 1996 .

[8]  Gloria Bordogna,et al.  Query term weights as constraints in fuzzy information retrieval , 1991, Inf. Process. Manag..

[9]  Marion Schulz Eine Werkbank zur interaktiven Erstellung semantikbasierter Computerlexika , 1998 .

[10]  Didier Dubois,et al.  Fuzzy information engineering: a guided tour of applications , 1997 .

[11]  Adwait Ratnaparkhi,et al.  A Maximum Entropy Model for Prepositional Phrase Attachment , 1994, HLT.

[12]  Images in weather forecasting , 1997 .

[13]  Ronald R. Yager,et al.  Counting the number of classes in a fuzzy set , 1993, IEEE Trans. Syst. Man Cybern..

[14]  Norbert Sensen,et al.  Algorithms for a job-scheduling problem within a parallel digital library , 1999, Proceedings of the 1999 International Conference on Parallel Processing.

[15]  Udi Manber,et al.  GLIMPSE: A Tool to Search Through Entire File Systems , 1994, USENIX Winter.

[16]  Johan van Benthem,et al.  Questions About Quantifiers , 1984, J. Symb. Log..

[17]  Alois Knoll,et al.  A System for the Content-Based Retrieval of Textual and Non-Textual Documents Using a Natural Language Interface , 1998 .

[18]  Panos Constantopoulos,et al.  Research and Advanced Technology for Digital Libraries , 2001, Lecture Notes in Computer Science.

[19]  Michael Collins,et al.  Prepositional Phrase Attachment through a Backed-off Model , 1995, VLC@ACL.

[20]  Marion Schulz,et al.  COLEX: Ein Computerlexikon für die automatische Sprachverarbeitung , 1996 .

[21]  Alexander Franz Automatic Ambiguity Resolution in Natural Language Processing , 1996, Lecture Notes in Computer Science.

[22]  Udo Hahn,et al.  Concurrent Lexicalized Dependency Parsing: The ParseTalk Model , 1994, COLING.

[23]  H Helbig Syntactic-semantic analysis of natural language by new word-class controlled functional analysis , 1986 .

[24]  Anca L. Ralescu,et al.  A note on rule representation in expert systems , 1986, Inf. Sci..

[25]  Manfred Pinkal,et al.  On the Limits of Lexical Meaning , 1983 .

[26]  John R. Smith,et al.  Search and Progressive Image Retrieval from Distributed Image/Video Databases: The SPIRE Project , 1998, ECDL.

[27]  Mats Rooth,et al.  Structural Ambiguity and Lexical Relations , 1991, ACL.

[28]  Edward L. Keenan,et al.  A semantic characterization of natural language determiners , 1986 .

[29]  Lotfi A. Zadeh,et al.  A Theory of Approximate Reasoning , 1979 .

[30]  Michael Eimermacher Wortorientiertes Parsen , 1988 .

[31]  Isabelle Bloch Information combination operators for data fusion: a comparative review with classification , 1996, IEEE Trans. Syst. Man Cybern. Part A.

[32]  Alexander S. Yeh,et al.  Some Properties of Preposition and Subordinate Conjunction Attachments , 1998, COLING-ACL.

[33]  Dragutin Petkovic,et al.  Query by Image and Video Content: The QBIC System , 1995, Computer.

[34]  R. Yager Connectives and quantifiers in fuzzy sets , 1991 .

[35]  D. Ralescu Cardinality, quantifiers, and the aggregation of fuzzy criteria , 1995 .

[36]  Donald H. Kraft,et al.  TIRS: a topological information retrieval system satisfying the requirements of the Waller-Kraft wish list , 1987, SIGIR '87.

[37]  Christoph Schwarze,et al.  Meaning, Use, and Interpretation of Language , 1983 .

[38]  Hermann Helbig,et al.  WORD AGENT BASED NATURAL LANGUAGE PROCESSING , 1999 .

[39]  J. AnneMiller The Balancing act , 1976 .

[40]  J. Zhang,et al.  Image retrieval for information systems , 1995, Electronic Imaging.

[41]  Ronald R. Yager,et al.  On ordered weighted averaging aggregation operators in multicriteria decisionmaking , 1988, IEEE Trans. Syst. Man Cybern..

[42]  Ellen Riloff,et al.  Connectionist, Statistical and Symbolic Approaches to Learning for Natural Language Processing , 1996, Lecture Notes in Computer Science.

[43]  Alberto Del Bimbo,et al.  Visual Image Retrieval by Elastic Matching of User Sketches , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[44]  H. Ritter,et al.  A Framework for Evaluating Approaches to Fuzzy Quantification , 1999 .

[45]  Alois Knoll,et al.  Fuzzy Quantifiers for Processing Natural-Language Queries in Content-Based Multimedia Retrieval Systems , 1997 .

[46]  S. Sitharama Iyengar,et al.  Guest Editors' Introduction: Image Databases , 1988, IEEE Trans. Software Eng..

[47]  Alois Knoll,et al.  Natural language navigation in multimedia archives: an integrated approach , 1999, MULTIMEDIA '99.

[48]  D. Dubois,et al.  Fuzzy cardinality and the modeling of imprecise quantification , 1985 .

[49]  Joachim Biskup,et al.  An Integrated Approach to Semantic Evaluation and Content-Based Retrieval of Multimedia Documents , 1998, ECDL.

[50]  Ingo Glöckner DFS - An Axiomatic Approach to Fuzzy Quantification , 1997 .

[51]  Alberto Del Bimbo,et al.  A Three-Dimensional Iconic Environment for Image Database Querying , 1993, IEEE Trans. Software Eng..

[52]  Adwait Ratnaparkhi Statistical Models for Unsupervised Prepositional Phrase Attachment , 1998, COLING.

[53]  Alois Knoll,et al.  Application of fuzzy quantifiers in image processing: a case study , 1999, 1999 Third International Conference on Knowledge-Based Intelligent Information Engineering Systems. Proceedings (Cat. No.99TH8410).

[54]  Norbert Fuhr,et al.  Probabilistic Datalog—a logic for powerful retrieval methods , 1995, SIGIR '95.

[55]  W. Silvert Symmetric Summation: A Class of Operations on Fuzzy Sets , 1979 .

[56]  Makoto Nagao,et al.  Corpus Based PP Attachment Ambiguity Resolution with a Semantic Dictionary , 1997, VLC.

[57]  J. Barwise,et al.  Generalized quantifiers and natural language , 1981 .

[58]  Lotfi A. Zadeh,et al.  A COMPUTATIONAL APPROACH TO FUZZY QUANTIFIERS IN NATURAL LANGUAGES , 1983 .

[59]  Sven Hartrumpf,et al.  Hybrid Disambiguation of Prepositional Phrase Attachment and Interpretation , 1999, EMNLP.

[60]  Hermann Helbig,et al.  Word Class Functions for Syntactic-Semantic Analysis , 1997 .

[61]  李幼升,et al.  Ph , 1989 .

[62]  K. R. Hardy,et al.  Threshold functions for automated cloud analyses of global meteorological satellite imagery , 1995 .

[63]  Etienne E. Kerre,et al.  An overview of fuzzy quantifiers. (I). Interpretations , 1998, Fuzzy Sets Syst..

[64]  Sven Hartrumpf Partial Evaluation for Efficient Access to Inheritance Lexicons , 1998, ArXiv.

[65]  Eric Brill,et al.  A Rule-Based Approach to Prepositional Phrase Attachment Disambiguation , 1994, COLING.

[66]  Alois Knoll,et al.  Data Fusion Based on Fuzzy Quantifiers , 1998 .

[67]  Alois Knoll,et al.  Query Evaluation and Information Fusion in a Retrieval System for Multimedia Documents , 1999 .

[68]  Gloria Bordogna,et al.  A Fuzzy Linguistic Approach Generalizing Boolean Information Retrieval: A Model and Its Evaluation , 1993, J. Am. Soc. Inf. Sci..

[69]  Marion Schulz,et al.  Knowledge Representation with MESNET - A Multilayered Extended Semantic Network , 1996 .