Mediating Among Diverse Data Formats.

Abstract : The growth of the Internet and other global networks has made large quantities of data available in a wide variety of formats. Unfortunately, most programs are only able to interpret a small number of formats, and cannot take advantage of data in unfamiliar formats. As the Internet grows, new applications arise, and legacy data persists, the diversity of formats will continue to increase, worsening the problem. Current approaches to data diversity fail to scale up gracefully, or fail to handle the full heterogeneity of data and data sources found on the Internet. I have developed a data model and a system of mediator agents that support the widespread use of diverse data formats much more effectively than current approaches do. In this thesis, I describe and evaluate the design and implementation of this data model, known as the Typed Object Model (or TOM), and the system of mediators that supports it. TOM is a read-only object-oriented data model that describes the abstract structure of data formats, their concrete representations, and relations between formats. TOM is supported by a distributed network of mediator agents (known as type brokers) that maintain information about data formats, and provide uniform access to conversions and other operations on those formats. Type brokers plan complex conversion strategies that can involve multiple servers, and ensure that conversions preserve information needed by clients. Data providers can also register new formats, operations, and conversions with type brokers in a decentralized manner, and make them usable anywhere on the Internet. TOM type brokers now work with hundreds of data formats, often through integration of off-the-shelf programs. TOM also supports a wide variety of applications and interfaces, such as the Web-based TOM Conversion Service, that have users worldwide.

[1]  Peter M. Schwarz,et al.  Managing change in the Rufus system , 1994, Proceedings of 1994 IEEE 10th International Conference on Data Engineering.

[2]  Robert Thibadeau Digital Labels for Digital Libraries , 1996 .

[3]  Jon Postel,et al.  Media Type Registration Procedure , 1994, RFC.

[4]  David Robson,et al.  Smalltalk-80: The Language and Its Implementation , 1983 .

[5]  Katia P. Sycara,et al.  Distributed Intelligent Agents , 1996, IEEE Expert.

[6]  Brewster Kahle,et al.  An information system for corporate users: wide area information servers , 1991 .

[7]  Gio Wiederhold,et al.  Mediators in the architecture of future information systems , 1992, Computer.

[8]  David Johnson,et al.  The Internet Gopher Protocol (a distributed document search and retrieval protocol) , 1993, RFC.

[9]  Steven R. Newcomb,et al.  The “HyTime ”: hypermedia/time-based document structuring language , 1991, CACM.

[10]  Mary Shaw,et al.  Architectural issues in software reuse: it's not just the functionality, it's the packaging , 1995, SSR '95.

[11]  Peter M. Schwarz,et al.  The Rufus System: Information Organization for Semi-Structured Data , 1993, VLDB.

[12]  Calton Pu,et al.  Applying an information gathering architecture to Netfind: a white pages tool for a changing and growing Internet , 1994, TNET.

[13]  Nathaniel S. Borenstein,et al.  MIME (Multipurpose Internet Mail Extensions): Mechanisms for Specifying and Describing the Format of Internet Message Bodies , 1992, RFC.

[14]  Larry Wall,et al.  Programming Perl , 1991 .

[15]  Jeannette M. Wing,et al.  A behavioral notion of subtyping , 1994, TOPL.

[16]  Giovanni Flammia XML and style sheets promise to make the web more accesible , 1997, IEEE Expert.

[17]  Mary Shaw,et al.  Software architecture - perspectives on an emerging discipline , 1996 .

[18]  Oren Etzioni,et al.  The MetaCrawler architecture for resource aggregation on the Web , 1997 .

[19]  Luis Gravano,et al.  The Stanford Digital Library metadata architecture , 1997, International Journal on Digital Libraries.

[20]  Karen R. Sollins,et al.  Functional Requirements for Uniform Resource Names , 1994, RFC.

[21]  Peter B. Danzig,et al.  Distributed Indexing of Autonomous Internet Services , 1992, Comput. Syst..

[22]  David Garlan,et al.  Architectural Mismatch or Why it's hard to build systems out of existing parts , 1995, 1995 17th International Conference on Software Engineering.

[23]  Jon Postel,et al.  Domain Name System Structure and Delegation , 1994, RFC.

[24]  Jennifer Widom,et al.  Object exchange across heterogeneous information sources , 1995, Proceedings of the Eleventh International Conference on Data Engineering.