Data Engineering

Maintaining the integrity of data and its accessibility are crucial tasks in database systems. Although each component in the storage hierarchy can be fairly reliable, a large collection of such components is prone to failure; this is especially true of the secondary storage system which normally contains a large number of magnetic disks. In designing a fault tolerant secondary storage system, one should keep in mind that failures, although potentially devastating, are expected to occur fairly infrequently; hence, it is important to provide reliability techniques that do not (significantly) hinder the system’s performance during normal operation. Furthermore, it is desirable to maintain a reasonable level of performance under failure as well. Since high degrees of reliability are traditionallyachieved through the use of duplicate components and redundant information, it is also reasonable to use these redundancies in improving the system’s performance during normal operation. In this article we concentrate on techniques for improving reliability of secondary storage systems as well as the resulting system performance during normal operation and under failure.

[1]  Peter M. G. Apers,et al.  Parallel evaluation of multi-join queries , 1995, SIGMOD '95.

[2]  Paul W. P. J. Grefen Combining Theory and Practice in Integrity Control: A Declarative Approach to the Specification of a Transaction Modification Subsystem , 1993, VLDB.

[3]  David J. DeWitt,et al.  Hybrid-Range Partitioning Strategy: A New Declustering Strategy for Multiprocessor Database Machines , 1990, VLDB.

[4]  Fouad A. Tobagi,et al.  Streaming RAID: a disk array management system for video files , 1993, MULTIMEDIA '93.

[5]  John C. S. Lui,et al.  Reducing I/O demand in video-on-demand storage servers , 1995, SIGMETRICS '95/PERFORMANCE '95.

[6]  Randy H. Katz,et al.  A case for redundant arrays of inexpensive disks (RAID) , 1988, SIGMOD '88.

[7]  Xiaolei Qian,et al.  The deductive synthesis of database transactions , 1993, TODS.

[8]  Philip S. Yu,et al.  Effect of system dynamics on coupling architectures for transaction processing , 1992, [1992] Eighth International Conference on Data Engineering.

[9]  Marek J. Sergot,et al.  The British Nationality Act as a logic program , 1986, CACM.

[10]  Dina Bitton,et al.  Disk Shadowing , 1988, VLDB.

[11]  Garth A. Gibson Performance and Reliability in Redundant Arrays of Inexpensive Disks , 1999, Int. CMG Conference.

[12]  Donald F. Towsley,et al.  Performance Analysis of a Fault Tolerant Mirrored Disk System , 1990, Performance.

[13]  Kenneth A. Ross Relations with relation names as arguments: algebra and calculus , 1992, PODS '92.

[14]  Jack A. Orenstein Spatial query processing in an object-oriented database system , 1986, SIGMOD '86.

[15]  Rudi Studer A Conceptual Model for Physical and Logical Time , 1987, ER.

[16]  Nick Roussopoulos,et al.  Nearest neighbor queries , 1995, SIGMOD '95.

[17]  E. F. Codd,et al.  Extending the database relational model to capture more meaning , 1979, ACM Trans. Database Syst..

[18]  C. Batini,et al.  A comparative analysis of methodologies for database schema integration , 1986, CSUR.

[19]  David B. Lomet,et al.  The hB-tree: a multiattribute indexing method with good guaranteed performance , 1990, TODS.

[20]  Jennifer Widom,et al.  Production Rules in Parallel and Distributed Database Environments , 1992, VLDB.

[21]  Amit P. Sheth,et al.  Specifying interdatabase dependencies in a multidatabase environment , 1991, Computer.

[22]  Yutaka Takahashi,et al.  Queueing analysis: A foundation of performance evaluation, volume 1: Vacation and priority systems, Part 1: by H. Takagi. Elsevier Science Publishers, Amsterdam, The Netherlands, April 1991. ISBN: 0-444-88910-8 , 1993 .

[23]  Won Kim,et al.  Observations on the ODMG-93 proposal for an object-oriented database language , 1994, SGMD.

[24]  Joachim W. Schmidt,et al.  Type Concepts for Database Definition , 1978, JCDKB.

[25]  Nancy A. Lynch,et al.  Impossibility of distributed consensus with one faulty process , 1983, PODS '83.

[26]  Jennifer Widom,et al.  Constraint Management in Loosely Coupled Distributed Databases , 1993 .

[27]  Thomas A. Ohanian,et al.  Digital Nonlinear Editing: New Approaches to Editing Film and Video , 1993 .

[28]  G. A. Miller THE PSYCHOLOGICAL REVIEW THE MAGICAL NUMBER SEVEN, PLUS OR MINUS TWO: SOME LIMITS ON OUR CAPACITY FOR PROCESSING INFORMATION 1 , 1956 .

[29]  Andrew H. Wilson Solution patterns for common data design problems , 1987, 1987 IEEE Third International Conference on Data Engineering.

[30]  Douglas Comer,et al.  Ubiquitous B-Tree , 1979, CSUR.

[31]  Christos Faloutsos,et al.  On packing R-trees , 1993, CIKM '93.

[32]  Xiaolei QIAN An Axiom System for Database Transactions , 1990, Inf. Process. Lett..

[33]  Richard R. Muntz,et al.  Fault tolerant design of multimedia servers , 1995, SIGMOD '95.

[34]  Shashi K. Gadia Toward a multihomogeheous model for a temporal database , 1986, 1986 IEEE Second International Conference on Data Engineering.

[35]  Shahram Ghandeharizadeh,et al.  Staggered striping in multimedia information systems , 1994, SIGMOD '94.

[36]  Hideaki Takagi,et al.  Queueing analysis: a foundation of performance evaluation , 1993 .

[37]  Scott Shenker,et al.  Epidemic algorithms for replicated database maintenance , 1988, OPSR.

[38]  Laurian M. Chirica,et al.  The entity-relationship model: toward a unified view of data , 1975, SIGF.

[39]  Albert Croker,et al.  The historical relational data model (HRDM) and algebra based on lifespans , 1986, 1987 IEEE Third International Conference on Data Engineering.

[40]  Banu Özden,et al.  Demand paging for video-on-demand servers , 1995, Proceedings of the International Conference on Multimedia Computing and Systems.

[41]  Christos Faloutsos,et al.  Hilbert R-tree: An Improved R-tree using Fractals , 1994, VLDB.

[42]  J. T. Robinson,et al.  The K-D-B-tree: a search structure for large multidimensional dynamic indexes , 1981, SIGMOD '81.

[43]  Donald F. Towsley,et al.  Continuous Media Sharing in Multimedia Database Systems , 1995, DASFAA.

[44]  Ramez Elmasri,et al.  The Category Concept: An Extension to the Entity-Relationship Model , 1985, Data Knowl. Eng..

[45]  H. V. Jagadish,et al.  A retrieval technique for similar shapes , 1991, SIGMOD '91.

[46]  Judea Pearl,et al.  Probabilistic reasoning in intelligent systems , 1988 .

[47]  John C. S. Lui,et al.  Performance Analysis of Disk Arrays under Failure , 1990, VLDB.

[48]  Sandra Heiler,et al.  Semantic heterogeneity as a result of domain evolution , 1991, SGMD.

[49]  Domenico Saccà,et al.  Database partitioning in a cluster of processors , 1983, TODS.

[50]  Michael Stonebraker,et al.  Efficient organization of large multidimensional arrays , 1994, Proceedings of 1994 IEEE 10th International Conference on Data Engineering.

[51]  Nick Roussopoulos,et al.  Direct spatial search on pictorial databases using packed R-trees , 1985, SIGMOD Conference.

[52]  Hong Wang,et al.  Recursive estimation and time-series analysis , 1986, IEEE Trans. Acoust. Speech Signal Process..

[53]  Gregory Piatetsky-Shapiro,et al.  Accurate estimation of the number of tuples satisfying a condition , 1984, SIGMOD '84.

[54]  Tom W. Keller,et al.  A comparison of high-availability media recovery techniques , 1989, SIGMOD '89.

[55]  Goetz Graefe,et al.  Query evaluation techniques for large databases , 1993, CSUR.

[56]  J. Menon,et al.  Methods for improved update performance of disk arrays , 1992, Proceedings of the Twenty-Fifth Hawaii International Conference on System Sciences.

[57]  David R. Cheriton,et al.  UIO: a uniform I/O system interface for distributed systems , 1987, TOCS.

[58]  Christos Faloutsos,et al.  The R+-Tree: A Dynamic Index for Multi-Dimensional Objects , 1987, VLDB.

[59]  Philip S. Yu,et al.  Design and modeling of clustered RAID , 1992, [1992] Digest of Papers. FTCS-22: The Twenty-Second International Symposium on Fault-Tolerant Computing.

[60]  Thomas R. Gruber,et al.  Design Rationale Capture as Knowledge Acquisition , 1991, ML.

[61]  Jim Gray,et al.  Parity Striping of Disk Arrays: Low-Cost Reliable Storage with Acceptable Throughput , 1990, VLDB.

[62]  Jai Menon,et al.  The Architecture Of A Fault-tolerant Cached RAID Controller , 1993, Proceedings of the 20th Annual International Symposium on Computer Architecture.

[63]  Brad J. Cox,et al.  Object-oriented programming ; an evolutionary approach , 1986 .

[64]  E. F. Codd,et al.  Further Normalization of the Data Base Relational Model , 1971, Research Report / RJ / IBM / San Jose, California.

[65]  Jennifer Widom,et al.  Managing Semantic Heterogeneity with Production Rules and Persistent Queues , 1993, VLDB.

[66]  Jai Menon,et al.  Comparison of Sparing Alternatives for Disk Arrays , 1992, [1992] Proceedings the 19th Annual International Symposium on Computer Architecture.

[67]  Jaideep Srivastava,et al.  Algorithms for loading parallel grid files , 1993, SIGMOD Conference.

[68]  Audra E. Kosh,et al.  Linear Algebra and its Applications , 1992 .

[69]  Christos Faloutsos,et al.  QBIC project: querying images by content, using color, texture, and shape , 1993, Electronic Imaging.

[70]  Shamkant B. Navathe,et al.  TSQL: A Language Interface for History Databases , 1987, Temporal Aspects in Information Systems.

[71]  Frank Olken,et al.  Random Sampling from Databases , 1993 .

[72]  Dina Bitton,et al.  Arm scheduling in shadowed disks , 1989, Digest of Papers. COMPCON Spring 89. Thirty-Fourth IEEE Computer Society International Conference: Intellectual Leverage.

[73]  John C. S. Lui,et al.  A Novel Video-On-Demand Storage Architecture for Supporting Constant Frame Rate with Variable Bit Rate Retrieval , 1995, NOSSDAV.

[74]  Salvatore J. Stolfo,et al.  The merge/purge problem for large databases , 1995, SIGMOD '95.

[75]  Asit Dan,et al.  Channel Allocation under Batching and VCR Control in Movie-On-Demand Servers , 1995 .

[76]  David J. DeWitt,et al.  GAMMA - A High Performance Dataflow Database Machine , 1986, VLDB.

[77]  Brent Hailpern Verifying Concurrent Processes Using Temporal Logic , 1982, Lecture Notes in Computer Science.

[78]  Randy H. Katz,et al.  Striped tape arrays , 1993, [1993] Proceedings Twelfth IEEE Symposium on Mass Storage systems.

[79]  Peter J. Rousseeuw,et al.  Finding Groups in Data: An Introduction to Cluster Analysis , 1990 .

[80]  Martín Abadi,et al.  Temporal Logic Programming , 1989, J. Symb. Comput..

[81]  Michael Stonebraker,et al.  The Case for Shared Nothing , 1985, HPTS.

[82]  Ramez Elmasri,et al.  On the design, use, and integration of data models , 1980 .

[83]  John C. S. Lui,et al.  Chained declustering: load balancing and robustness to skew and failures , 1992, [1992 Proceedings] Second International Workshop on Research Issues on Data Engineering: Transaction and Query Processing.

[84]  Michael Stonebraker,et al.  Distributed RAID-a new multiple copy algorithm , 1990, [1990] Proceedings. Sixth International Conference on Data Engineering.

[85]  Rachid Guerraoui Revistiting the Relationship Between Non-Blocking Atomic Commitment and Consensus , 1995, WDAG.

[86]  Richard O. Duda,et al.  Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[87]  H. V. Jagadish,et al.  Linear clustering of objects with multiple attributes , 1990, SIGMOD '90.

[88]  Randy H. Katz,et al.  Performance consequences of parity placement in disk arrays , 1991, ASPLOS IV.

[89]  Jennifer Widom,et al.  Object exchange across heterogeneous information sources , 1995, Proceedings of the Eleventh International Conference on Data Engineering.

[90]  Lilian Hobbs,et al.  Rdb/VMS A Comprehensive Guide , 1991 .

[91]  C. Mohan,et al.  Interactions between query optimization and concurrency control , 1992, [1992 Proceedings] Second International Workshop on Research Issues on Data Engineering: Transaction and Query Processing.

[92]  Diane C. P. Smith,et al.  Database abstractions: aggregation and generalization , 1977, TODS.

[93]  Donald F. Towsley,et al.  Performance of a mirrored disk in a real-time transaction system , 1991, SIGMETRICS '91.

[94]  Ramez Elmasri,et al.  A structural model for database systems , 1979 .

[95]  Asit Dan,et al.  Scheduling policies for an on-demand video server with batching , 1994, MULTIMEDIA '94.

[96]  Hans-Peter Kriegel,et al.  Knowledge Discovery in Large Spatial Databases: Focusing Techniques for Efficient Class Identification , 1995, SSD.

[97]  Marvin Theimer,et al.  Session guarantees for weakly consistent replicated data , 1994, Proceedings of 3rd International Conference on Parallel and Distributed Information Systems.

[98]  Won Kim,et al.  Modeling concepts for VLSI CAD objects , 1985, TODS.

[99]  E. B. Moss,et al.  Nested Transactions: An Approach to Reliable Distributed Computing , 1985 .

[100]  Edward K. Lee Software and Performance Issues in the Implementation of a RAID Prototype , 1990 .

[101]  David A. Patterson,et al.  Maximizing performance in a striped disk array , 1990, [1990] Proceedings. The 17th Annual International Symposium on Computer Architecture.

[102]  Michael L. Brodie On knowledge base management systems: integrating artificial intelligence and database technologies , 2011, Topics in information systems.

[103]  Frank Wm. Tompa,et al.  Text / Relational Database Management Systems: Harmonizing SQL and SGML , 1994, ADB.

[104]  Klaus H. Hinrichs,et al.  The Grid File: A Data Structure to Support Proximity Queries on Spatial Objects , 1983, International Workshop on Graph-Theoretic Concepts in Computer Science.

[105]  Randy H. Katz,et al.  An evaluation of redundant arrays of disks using an Amdahl 5890 , 1990, SIGMETRICS '90.

[106]  David J. DeWitt,et al.  Chained declustering: a new availability strategy for multiprocessor database machines , 1990, [1990] Proceedings. Sixth International Conference on Data Engineering.

[107]  Dennis McLeod,et al.  The semantic data model: a modelling mechanism for data base applications , 1978, SIGMOD Conference.

[108]  Banu Özden,et al.  A Low-Cost Storage Server for Movie on Demand Databases , 1994, VLDB.

[109]  Peter J. Haas,et al.  Sequential sampling procedures for query size estimation , 1992, SIGMOD '92.

[110]  Philip S. Yu,et al.  Effect of Skew on Join Performance in Parallel Architectures , 1988, Proceedings [1988] International Symposium on Databases in Parallel and Distributed Systems.

[111]  Asit Dan,et al.  Buffering and caching in large-scale video servers , 1995, Digest of Papers. COMPCON'95. Technologies for the Information Superhighway.

[112]  Michael Rubin Nonlinear: A guide to electronic film and video editing , 1991 .

[113]  Ping Xu,et al.  Random sampling from hash files , 1990, SIGMOD '90.

[114]  Doron Rotem,et al.  Simple Random Sampling from Relational Databases , 1986, VLDB.

[115]  Sam Toueg,et al.  The weakest failure detector for solving consensus , 1992, PODC '92.

[116]  Sandra Heiler,et al.  Semantic interoperability , 1995, CSUR.