Two-Dimensional Filters for Structured Text

The paper introduces a method for defining filters for structured text. In the method, the text structure is originally defined by a grammar consisting of a set of productions. To describe the information interests, a two-dimensional template is first created interactively from the grammar to show the structure of a set of textual elements, at a chosen level of detail. The template depicts the hierarchical structure of the elements and indicates also optionality, alternatives, and iteration in the structure. Then, the template is filled by constraints and annotations. The constraints allow giving conditions to the content of parts, to the position of parts in an ordered set of parts, and to the number of parts obeying a specified property. In a compound filter, several templates are connected by annotations. The method is intended to be used as a theoretical framework for developing flexible and powerful graphical interfaces for filtering structured text. A prototype implementation is described.

[1]  Heikki Mannila,et al.  Retrieval from hierarchical texts by partial patterns , 1993, SIGIR.

[2]  Mariano P. Consens,et al.  Creating and filtering structural data visualizations using hygraph patterns , 1994 .

[3]  Serge Abiteboul,et al.  From structured documents to novel query facilities , 1994, SIGMOD '94.

[4]  Peter Buneman,et al.  Semistructured data , 1997, PODS.

[5]  Elisa Bertino,et al.  Query processing in a multimedia document system , 1988, TOIS.

[6]  Moshé M. Zloof Query-by-Example: A Data Base Language , 1977, IBM Syst. J..

[7]  P. David Stotts,et al.  Specifying structured document transformations , 1988 .

[8]  Frank Wm. Tompa,et al.  Text/Relational Database Management Systems: Overview and Proposed SQL Extensions , 1995 .

[9]  Ian A. Macleod,et al.  Storage and retrieval of structured documents , 1990, Inf. Process. Manag..

[10]  John K. Ousterhout,et al.  Tcl and the Tk Toolkit , 1994 .

[11]  W. A. Martin,et al.  Parsing , 1980, ACL.

[12]  Carolyn Watters,et al.  A two‐level structure for textual databases to support hypertext access , 1992 .

[13]  Justin Zobel,et al.  Database Systems for Structured Documents 1 , .

[14]  Nicholas J. Belkin,et al.  Information filtering and information retrieval: two sides of the same coin? , 1992, CACM.

[15]  Heikki Mannila,et al.  A Structured Document Database System , 1990 .

[16]  Gultekin Özsoyoglu,et al.  Example-based graphical database query languages , 1993, Computer.

[17]  Gaston H. Gonnet,et al.  Mind Your Grammar: a New Approach to Modelling Text , 1987, VLDB.

[18]  Forbes J. Burkowski Textriever: a retrieval engine for multimedia databases , 1991 .

[19]  Charles F. Goldfarb,et al.  SGML handbook , 1990 .

[20]  Frank Wm. Tompa,et al.  Grammars++ for Modelling Information in Text , 1999, Inf. Syst..

[21]  Forbes J. Burkowski,et al.  An Algebra for Hierarchically Organized Text-Dominate Databases , 1992, Inf. Process. Manag..

[22]  Alfred V. Aho,et al.  The Theory of Parsing, Translation, and Compiling , 1972 .

[23]  Tova Milo,et al.  Algebras for querying text regions (extended abstract) , 1995, PODS.

[24]  Mary S. Neff,et al.  Creating And Querying Lexical Data Bases , 1988, ANLP.

[25]  Ian A. Macleod A Query Language for Retrieving Information from Hierarchic Text Structures , 1991, Comput. J..

[26]  Gultekin Özsoyoglu,et al.  Towards a unified visual database access , 1993, SIGMOD '93.

[27]  Martti Penttonen,et al.  Transformation of Structured Documents with the Use of Grammar , 1993, Electron. Publ..

[28]  Kristian Fischer,et al.  The Open Document Architecture: From Standardization to the Market , 1992, IBM Syst. J..

[29]  Charles L. A. Clarke,et al.  An Algebra for Structured Text Search and a Framework for its Implementation , 1995, Comput. J..

[30]  Jean Tague-Sutcliffe,et al.  From text to hypertext by indexing , 1995, TOIS.

[31]  Carolyn R. Watters,et al.  A Two-Level Structure for Textual Databases to Support Hypertext Access , 1992, J. Am. Soc. Inf. Sci..

[32]  Airi Salminen,et al.  Implementation of Two-dimensional Filters for Structured Documents in SYNDOC environment , 1995 .

[33]  Ricardo A. Baeza-Yates,et al.  A language for queries on structure and contents of textual databases , 1995, SIGIR '95.