Data wrapping on the World Wide Web

In this thesis, I designed and implemented the Generic Screen Scraper. The Generic Screen Scraper is a tool that generates data wrappers to extract requested data from data sources on the World-Wide Web. Data wrappers isolates users from interacting directly with heterogeneous data sources (i.e. SQL or non-SQL) by allowing all queries to be issued using Standard Query Language based on the relational data model. Structured or semi-structured data sources on the World-Wide can be made scrapable by the Generic Screen Scraper, as long as the data sources are registred following some specifications. Thesis Supervisor: Michael D. Siegel Title: Principal Research Scientist, Sloan School of Management

[1]  Sandra Heiler,et al.  Semantic heterogeneity as a result of domain evolution , 1991, SGMD.

[2]  Diane C. P. Smith,et al.  Database abstractions: aggregation and generalization , 1977, TODS.

[3]  Diane M. Strong,et al.  Data quality in context , 1997, CACM.

[4]  Hector J. Levesque A View Of Knowledge Representation , 1985, On Knowledge Base Management Systems.

[5]  Jack Minker,et al.  Logic and Databases: A Deductive Approach , 1984, CSUR.

[6]  Michael Kifer,et al.  Logical foundations of object-oriented and frame-based languages , 1995, JACM.

[7]  William A. Woods,et al.  What's in a Link: Foundations for Semantic Networks , 1975 .

[8]  Veda C. Storey,et al.  A Framework for Analysis of Data Quality Research , 1995, IEEE Trans. Knowl. Data Eng..

[9]  R. Guha Contexts: a formalization and some applications , 1992 .

[10]  Charles J. Petrie,et al.  Enterprise Information Modeling and Model Integration in Carnot , 1992 .

[11]  Craig A. Knoblock,et al.  Retrieving and Integrating Data from Multiple Information Sources , 1993, Int. J. Cooperative Inf. Syst..

[12]  Andrew Davison A survey of logic programming-based object-oriented languages , 1993 .

[13]  Witold Litwin,et al.  Multidatabase Interoperability , 1986, Computer.

[14]  Peter Szolovits,et al.  What Is a Knowledge Representation? , 1993, AI Mag..

[15]  Nazli Choucri,et al.  International Energy Futures: Petroleum Prices, Power, and Payments , 1981 .

[16]  Stuart E. Madnick,et al.  Incorporating Generalized Quantifiers Into Description Logic For Representing Data Source Contents , 1997, DS-7.

[17]  Stuart E. Madnick,et al.  Context interchange: sharing the meaning of data , 1991, SGMD.

[18]  Peter F. Patel-Schneider,et al.  The DARPA Knowledge Sharing Effort: A Progress Report , 1997, KR.

[19]  Stuart E. Madnick,et al.  A Polygen Model for Heterogeneous Database Systems: The Source Tagging Perspective , 1990, VLDB.

[20]  Witold Litwin,et al.  O*SQL: A Language for Object Oriented Multidatabase Interoperability , 1992, DS-5.

[21]  Ian A. Mason,et al.  The Semantics of Propositional Contexts , 1994, ISMIS.

[22]  Jeffrey D. Ullman,et al.  MedMaker: a mediation system based on declarative specifications , 1996, Proceedings of the Twelfth International Conference on Data Engineering.

[23]  Adrian Walker,et al.  PROSQL: A Prolog Programming Interface with SQL/DS , 1984, Expert Database Workshop.

[24]  Timothy W. Finin,et al.  Enabling Technology for Knowledge Sharing , 1991, AI Mag..

[25]  Yair Wand,et al.  A Proposal for a Formal Model of Objects , 1989, Object-Oriented Concepts, Databases, and Applications.

[26]  Stuart E. Madnick,et al.  The Context Interchange Network Prototype , 1995, DS-6.

[27]  Stuart E. Madnick,et al.  Working Paper Alfred P. Sloan School of Management Database Systems in a Dynamic Environment Database Systems in a Dynamic Environment Received Context Interchange: Overcoming the Challenges of Large-scale Interoperable Database Systems in a Dynamic Environment* , 2022 .

[28]  Ramanathan V. Guha,et al.  CYC: A Midterm Report , 1990, AI Mag..

[29]  Hongjun Lu,et al.  Discovering and Reconciling Semantic Conflicts: A Data Mining Perspective , 1997, DS-7.

[30]  Michael J. Maher,et al.  Constraint Logic Programming: A Survey , 1994, J. Log. Program..

[31]  W. Litwin,et al.  An overview of the multi-database manipulation language MDSL , 1987, Proceedings of the IEEE.

[32]  J. A. Robinson,et al.  A Machine-Oriented Logic Based on the Resolution Principle , 1965, JACM.

[33]  Ron Weber,et al.  An Ontological Analysis of some Fundamental Information Systems Concepts , 1988, ICIS.

[34]  Hector J. Levesque,et al.  The Knowledge Level of a KBMS , 1986, On Knowledge Base Management Systems.

[35]  Vipul Kashyap,et al.  Semantic and schematic similarities between database objects: a context-based approach , 1996, The VLDB Journal.

[36]  J. McCarthy,et al.  Formalizing Context (Expanded Notes) , 1994 .

[37]  Stuart E. Madnick,et al.  A Source Tagging Theory for Heterogeneous Database Systems , 1990, ICIS.

[38]  Arnon Rosenthal,et al.  Using semantic values to facilitate interoperability among heterogeneous information systems , 1994, TODS.

[39]  Thomas R. Gruber,et al.  Toward principles for the design of ontologies used for knowledge sharing? , 1995, Int. J. Hum. Comput. Stud..

[40]  Jorge B. Bocca On the evaluation strategy of EDUCE , 1986, SIGMOD '86.

[41]  Daniel I. A. Cohen,et al.  Introduction to computer theory , 1986 .

[42]  Jintae Lee,et al.  Partially shared views: a scheme for communicating among groups that use different type hierarchies , 1990, TOIS.

[43]  Krzysztof R. Apt,et al.  An Analysis of Loop Checking Mechanisms for Logic Programs , 1991, Theor. Comput. Sci..

[44]  Dennis McLeod,et al.  A federated architecture for information management , 1985, TOIS.

[45]  Ramanathan V. Guha,et al.  Building Large Knowledge-Based Systems: Representation and Inference in the Cyc Project , 1990 .

[46]  E. F. Codd,et al.  A relational model of data for large shared data banks , 1970, CACM.

[47]  James A. Larson,et al.  A Theory of Attribute Equivalence in Databases with Application to Schema Integration , 1989, IEEE Trans. Software Eng..

[48]  Jungyun Seo,et al.  Classifying schematic and data heterogeneity in multidatabase systems , 1991, Computer.

[49]  Ian A. Mason,et al.  Propositional Logic of Context , 1993, AAAI.

[50]  Stuart E. Madnick,et al.  Are we moving toward an information superhighway or a Tower of Babel? The challenge of large-scale semantic heterogeneity , 1996, Proceedings of the Twelfth International Conference on Data Engineering.

[51]  Robert M. MacGregor,et al.  A Deductive Pattern Matcher , 1988, AAAI.

[52]  Gio Wiederhold,et al.  Mediators in the architecture of future information systems , 1992, Computer.

[53]  Ravi Krishnamurthy,et al.  Language features for interoperability of databases with schematic discrepancies , 1991, SIGMOD '91.

[54]  Andreas Podelski,et al.  Towards a Meaning of LIFE , 1991, J. Log. Program..

[55]  Marta Jakóbisiak,et al.  Programming the Web : design and implementation of a multidatabase browser , 1996 .

[56]  Ali R. Hurson,et al.  A taxonomy and current issues in multidatabase systems , 1992, Computer.

[57]  Wanda Pratt,et al.  Integrating Information Sources Using Context Logic , 1995 .

[58]  Michael Hanus,et al.  The Integration of Functions into Logic Programming: From Theory to Practice , 1994, J. Log. Program..

[59]  Tim Berners-Lee,et al.  The World-Wide Web , 1994, CACM.

[60]  Weimin Du,et al.  The Pegasus heterogeneous multidatabase system , 1991, Computer.

[61]  Herbert B. Enderton,et al.  A mathematical introduction to logic , 1972 .

[62]  Patrick Valduriez,et al.  Scaling heterogeneous databases and the design of Disco , 1996, Proceedings of 16th International Conference on Distributed Computing Systems.

[63]  Raymond Reiter,et al.  Towards a Logical Reconstruction of Relational Database Theory , 1982, On Conceptual Modelling.

[64]  Frank Wm. Tompa A data model for flexible hypertext database systems , 1989, TOIS.

[65]  Ravi Krishnamurthy,et al.  Interoperability of heterogeneous databases with schematic discrepancies , 1991, [1991] Proceedings. First International Workshop on Interoperability in Multidatabase Systems.

[66]  Adrian Akmajian,et al.  Linguistics: An Introduction to Language and Communication , 1979 .

[67]  Richard Y. Wang,et al.  Anchoring data quality dimensions in ontological foundations , 1996, CACM.

[68]  R. MacGregor,et al.  Mermaid—A front-end to distributed heterogeneous databases , 1987, Proceedings of the IEEE.

[69]  Ron Weber,et al.  Toward a Theory of the Deep Structure of Information Systems , 1990, ICIS.

[70]  Tony Mason,et al.  Lex & Yacc , 1992 .

[71]  Jennifer Widom,et al.  Object exchange across heterogeneous information sources , 1995, Proceedings of the Eleventh International Conference on Data Engineering.

[72]  John McCarthy,et al.  Generality in artificial intelligence , 1987, Resonance.

[73]  Amihai Motro,et al.  Superviews: Virtual Integration of Multiple Databases , 1987, IEEE Transactions on Software Engineering.

[74]  Michael Brecher,et al.  Introduction: Crisis, Conflict, War—State of the Discipline , 1996 .

[75]  Christine Collet,et al.  Resource integration using a large knowledge base in Carnot , 1991, Computer.

[76]  Stuart E. Madnick,et al.  From VLDB to VMLDB (Very MANY Large Data Bases): Dealing with Large-Scale Semantic Heterogenity , 1995, VLDB.

[77]  S. Goldfeld,et al.  Some Tests for Homoscedasticity , 1965 .

[78]  Allen Newell,et al.  The Knowledge Level , 1989, Artif. Intell..

[79]  Peter Buneman,et al.  Constructing superviews , 1981, SIGMOD '81.

[80]  Witold Litwin,et al.  An overview of the multidatabase system MRDSM , 1985, ACM '85.

[81]  Terry A. Landers,et al.  An Overview of MULTIBASE , 1986, DDB.

[82]  Ron Weber,et al.  An Ontological Model of an Information System , 1990, IEEE Trans. Software Eng..

[83]  Nazli Choucri,et al.  Political Economy of the Global Environment , 1993 .

[84]  Roland H. C. Yap,et al.  The CLP( R ) language and system , 1992, TOPL.