MonetDB/DataCell: leveraging the column-store database technology for efficient and scalable stream processing

Het is mogelijk om grote datasystemen te ontwerpen die kunnen omgaan met binnenkomende datastromen en deze kunnen combineren met bestaande data, blijkt uit de onderzoeksresultaten van Erietta Liarou. Liarou richtte zich op de vraag hoe generieke datamanagementsystemen gebouwd kunnen worden die hun verwerkingsmodel kunnen aanpassen afhankelijk van de manier waarop gegevens en query's beschikbaar komen. Vandaag de dag moeten online analytische programma’s kunnen omgaan met een snelle datastroom. Zo proberen nieuwe toepassingen in de mobiele sector de datastroom te gebruiken voor reclame en routering. In dezelfde lijn vereisen grootschalige Cloud-infrastructuren een continue monitoring om de stabiliteit te waarborgen en cyberaanvallen te kunnen pareren. Wetenschappelijke databases en webloganalyses vereisen een efficiente verwerking van data voor de ondersteuning van de besluitvorming. Het omgaan met continuous queries (query’s die voor een lange tijd actief blijven) en het snel analyseren van grote datastromen in combinatie en vergelijking met reeds opgeslagen informatie kan met de huidige database- en streaming(datastroom)-technologie nog niet goed worden uitgevoerd. Databasesystemen missen de functionaliteit voor de verwerking van continuous queries, en streaming-systemen schalen niet. Liarou zocht naar een oplossing voor dit probleem door de beste eigenschappen van beide werelden te combineren.

[1]  Joseph M. Hellerstein,et al.  Lifting the Burden of History from Adaptive Query Processing , 2004, VLDB.

[2]  Martin L. Kersten,et al.  Enhanced stream processing in a DBMS kernel , 2013, EDBT '13.

[3]  Miron Livny,et al.  On being optimistic about real-time constraints , 1990, PODS '90.

[4]  W.C.A. Wijngaards,et al.  Agent-Based Modelling of Dynamics: Biological and Organisational Applications , 2002 .

[5]  Scott Shenker,et al.  Enhancing P2P File-Sharing with an Internet-Scale Query Processor , 2004, VLDB.

[6]  Peter Boncz,et al.  UvA-DARE ( Digital Academic Repository ) Monet ; a next-Generation DBMS Kernel For Query-Intensive Applications , 2007 .

[7]  M. Umair,et al.  Adaptivity, Emotion, and Rationality in Human and Ambient Agent Models , 2012 .

[8]  Neerincx,et al.  Human-computer interaction and presence in virtual reality exposure therapy , 2003 .

[9]  Manolis Koubarakis,et al.  Evaluating Conjunctive Triple Pattern Queries over Large Structured Overlay Networks , 2006, SEMWEB.

[10]  S. J. Karlsson Scalable distributed data structures for database management , 2000 .

[11]  Agnes Nakakawa,et al.  A collaboration process for enterprise architecture creation , 2012 .

[12]  David J. DeWitt,et al.  NiagaraCQ: a scalable continuous query system for Internet databases , 2000, SIGMOD '00.

[13]  Ying Xing,et al.  The Design of the Borealis Stream Processing Engine , 2005, CIDR.

[14]  Timos K. Sellis,et al.  Data intensive production systems: the DIPS approach , 1989, SGMD.

[15]  Donald D. Chamberlin,et al.  Functional specifications of a subsystem for data base integrity , 1975, VLDB '75.

[16]  Samuel Madden,et al.  Continuously adaptive continuous queries over streams , 2002, SIGMOD '02.

[17]  Michael Stonebraker,et al.  The POSTGRES Rule Manager , 1988, IEEE Trans. Software Eng..

[18]  Rajeev Motwani,et al.  Operator scheduling in data stream systems , 2004, VLDB 2004.

[19]  Umeshwar Dayal,et al.  The HiPAC project: combining active databases and timing constraints , 1988, SGMD.

[20]  Pavel Serdyukov,et al.  Search for expertise : going beyond direct evidence , 2009 .

[21]  J. Westra,et al.  Organizing adaptation using agents in serious games , 2011 .

[22]  P. V. Maanen Adaptive Support for Human-Computer Teams : Exploring the Use of Cognitive Models of Trust and Attention , 2010 .

[23]  Hendrik Aleven Navigation Support for Learners in Informal Learning Networks , 2010 .

[24]  R. P. Jagadeesh Chandra Bose,et al.  Process mining in the large : preprocessing, discovery, and diagnostics , 2012 .

[25]  Patrick Pfeffer,et al.  The design and implementation of O 2 , 1988 .

[26]  R. G. Santana Tapia,et al.  Assessing business-IT alignment in networked organizations , 2009 .

[27]  G. Jonker Efficient and Equitable Exchange in Air Traffic Management Plan Repair using Spender-signed Currency , 2008 .

[28]  Qiang Chen,et al.  Aurora : a new model and architecture for data stream management ) , 2006 .

[29]  Michael Stonebraker,et al.  Retrospective on Aurora , 2004, The VLDB Journal.

[30]  P. Groot,et al.  A Theoretical and Empirical Analysis of Approximation in Symbolic Problem Solving , 2004 .

[31]  Lukasz Golab,et al.  Sliding Window Query Processing over Data Streams , 2006 .

[32]  Qing Gu,et al.  Guiding Service-Oriented Software Engineering: A View-based Approach , 2011 .

[33]  Ying Li,et al.  Microsoft CEP Server and Online Behavioral Targeting , 2009, Proc. VLDB Endow..

[34]  Dennis Reidsma,et al.  Annotations and subjective machines of annotators, embodied agents, users, and other humans , 2008 .

[35]  C. Gerritsen Caught in the Act: Investigating Crime by Agent-Based Simulation , 2010 .

[36]  G. F. Siddiqui Integrative Modeling of Emotions in Virtual Agents , 2010 .

[37]  Olga Anatoliyivna Kulyk,et al.  Do You Know What I Know? Situational Awareness of Co-located Teams in Multidisplay Environments. , 2010 .

[38]  P. van Kranenburg,et al.  A Computational Approach to Content-Based Retrieval of Folk Song Melodies , 2010 .

[39]  Bob van der Vecht,et al.  Adjustable Autonomy: Controling Influences on Decision Making , 2009 .

[40]  E. Broek Affective Signal Processing (ASP): Unraveling the mystery of emotions , 2011 .

[41]  H. Stuckenschmidt,et al.  Ontology-Based Information Sharing in Weakly Structured Environments , 2003 .

[42]  Michael Stonebraker,et al.  Linear Road: A Stream Data Management Benchmark , 2004, VLDB.

[43]  Roelof van Zwol Modelling and searching web-based document collections , 2002 .

[44]  Stacey Fusae Nagat User Assistance for Multitasking with Interruptions on a Mobile Device , 2006 .

[45]  Michael J. Franklin,et al.  Continuous Analytics: Rethinking Query Processing in a Network-Effect World , 2009, CIDR.

[46]  Hamid Pirahesh,et al.  Alert: An Architecture for Transforming a Passive DBMS into an Active DBMS , 1991, VLDB.

[47]  Rajeev Motwani,et al.  Chain: operator scheduling for memory minimization in data stream systems , 2003, SIGMOD '03.

[48]  Moshé M. Zloof QBE/OBE: A Language for Office and Business Automation , 1981, Computer.

[49]  Rainer Malik CONAN : Text Mining in the Biomedical Domain , 2006 .

[50]  M. B. van Riemsdijk,et al.  Cognitive agent programming : A semantic approach , 2006 .

[51]  Martin Wigbertus Antonius Caminada For the sake of the Argument : explorations into argument-based reasoning , 1997 .

[52]  Joost Geurts,et al.  A document engineering model and processing framework for multimedia documents , 2010 .

[53]  Marco Kalz,et al.  Placement Support for Learners in Learning Networks , 2006 .

[54]  van Joeri Ruth Flattening queries over nested data types , 2006 .

[55]  A. J. Lehmann Causation in artificial intelligence and law : a modelling approach , 2003 .

[56]  Qiming Chen,et al.  Experience in Extending Query Engine for Continuous Analytics , 2010, DaWak.

[57]  Setrag Khoshafian,et al.  A decomposition storage model , 1985, SIGMOD Conference.

[58]  Pae Pieter Bellekens An approach towards context-sensitive and user-adapted access to heterogeneous data sources, illustrated in the television domain , 2010 .

[59]  Martin Kersten,et al.  Exploiting the power of relational databases for efficient stream processing , 2009, EDBT '09.

[60]  Rinke Hoekstra,et al.  Ontology Representation - Design Patterns and Ontologies that Make Sense , 2009, Frontiers in Artificial Intelligence and Applications.

[61]  Syed Waqar Jaffry,et al.  Analysis and Validation of Models for Trust Dynamics , 2011 .

[62]  Anastasia Ailamaki,et al.  QPipe: a simultaneously pipelined relational query engine , 2005, SIGMOD '05.

[63]  S. Bocconi,et al.  Vox Populi : generating video documentaries from semantically annotated media repositories , 2006 .

[64]  Philip A. Pinto,et al.  The Large Synoptic Survey Telescope , 2006 .

[65]  C. N. V. D. Wal Social Agents: Agent-Based Modelling of Integrated Internal and Social Dynamics of Cognitive and Affective Processes , 2012 .

[66]  W. H. van Atteveldt,et al.  Semantic Network Analysis: Techniques for Extracting, Representing, and Querying Media Content , 2008 .

[67]  Edgar Meij,et al.  Combining concepts and language models for information access , 2011, SIGF.

[68]  David Maier,et al.  No pane, no gain: efficient evaluation of sliding-window aggregates over data streams , 2005, SGMD.

[69]  Ling Liu,et al.  PeerCQ: a decentralized and self-configuring peer-to-peer information monitoring system , 2003, 23rd International Conference on Distributed Computing Systems, 2003. Proceedings..

[70]  L. Evans The Large Hadron Collider , 2012, Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences.

[71]  Marijn Koolen,et al.  The meaning of structure: the value of link evidence for information retrieval , 2011, SIGF.

[72]  T. William Olle,et al.  The Codasyl approach to data base management , 1978 .

[73]  编程语言 Query by Example , 2010, Encyclopedia of Database Systems.

[74]  Bela Mutschler,et al.  Modeling and simulating causal dependencies on process-aware information systems from a cost perspective , 2008 .

[75]  Nes UvA-DARE ( Digital Academic Repository ) Image database management systems design considerations algorithms and architecture , 2007 .

[76]  P.A.T. van Eck,et al.  A Compositional Semantic Structure for Multi-Agent Systems Dynamics , 2001 .

[77]  Philip S. Yu,et al.  SPADE: the system s declarative stream processing engine , 2008, SIGMOD Conference.

[78]  J. Gratch,et al.  Virtual Agents for Human Communication : Emotion Regulation and Involvement-Distance Trade-Offs in Embodied Conversational Agents and Robots , 2011 .

[79]  Ivo Swartjes Whose story is it anyway? How improv informs agency and authorship of emergent narrative , 2010 .

[80]  Henning Rode,et al.  From Document to Entity Retrieval: Improving Precision and Performance of Focused Text Search , 2008 .

[81]  G. Weikum Querying the Internet with PIER , 2005 .

[82]  F. E. A Relational Model of Data Large Shared Data Banks , 2000 .

[83]  Boris Shishkov,et al.  Software Specification Based on Re-usable Business Components , 2005 .

[84]  X. Mao Airport under Control : Multi-agent scheduling for airport ground handling , 2011 .

[85]  Hector Garcia-Molina,et al.  An Overview of Real-Time Database Systems , 1995, NATO ASI RTC.

[86]  Ricardo Neisse,et al.  Trust and privacy management support for context-aware service platforms , 2012 .

[87]  Martin Kersten,et al.  A Query Language for a Data Refinery Cell , 2007 .

[88]  Marina Velikova,et al.  Monotone models for prediction in data mining , 2006 .

[89]  Andreas Martin Thomas Lincke,et al.  Electronic business negotiation: some experimental studies on the interaction between medium, innovation context, and culture , 2003 .

[90]  Michiel Hildebrand End-user support for access to heterogeneous linked data , 2010 .

[91]  Rik Eshuis,et al.  Semantics and Verification of UML Activity Diagrams for Workflow Modelling , 2002 .

[92]  E. V. D. Spek,et al.  Experiments in serious game design : a cognitive approach , 2011 .

[93]  Walid G. Aref,et al.  Incremental Evaluation of Sliding-Window Queries over Data Streams , 2007, IEEE Transactions on Knowledge and Data Engineering.

[94]  Hector Garcia-Molina,et al.  Scheduling real-time transactions: a performance evaluation , 1988, TODS.

[95]  Natalia Stash,et al.  Incorporating cognitive/learning styles in a general-purpose adaptive hypermedia system , 2007, LINK.

[96]  Jennifer Widom,et al.  Active Database Systems: Triggers and Rules For Advanced Database Processing , 1994 .

[97]  Kirk Pruhs,et al.  Algorithms and metrics for processing multiple heterogeneous continuous queries , 2008, TODS.

[98]  A. E. Gammal Towards a comprehensive framework for business process compliance , 2012 .

[99]  Christian Glahn,et al.  Contextual support of social engagement and reflection on the Web , 2009 .

[100]  Thijs Westerveld,et al.  Using generative probabilistic models for multimedia retrieval , 2005, SIGF.

[101]  Jim Melton,et al.  SQL:2003 has been published , 2004, SGMD.

[102]  Stratos Idreos,et al.  Too Many Links in the Horizon; What is Next? Linked Views and Linked History , 2011 .

[103]  José Janssen,et al.  Paving the Way for Lifelong Learning. Facilitating competence development through a learning path specification , 2010 .

[104]  Moshé M. Zloof Query-by-Example: A Data Base Language , 1977, IBM Syst. J..

[105]  Manolis Koubarakis,et al.  Continuous MultiWay Joins over Distributed Hash Tables , 2007 .

[106]  Michael J. Franklin,et al.  Streaming Queries over Streaming Data , 2002, VLDB.

[107]  Wouter Weerkamp,et al.  Finding people and their utterances in social media , 2010, SIGIR.

[108]  Martin L. Kersten,et al.  An architecture for recycling intermediates in a column-store , 2009, SIGMOD Conference.

[109]  David R. Karger,et al.  What would it mean to blog on the semantic web? , 2005, J. Web Semant..

[110]  Ans A. G. Steuten A contribution to the linguistic analysis of business conversations within the language/action perspective , 1998 .

[111]  Martin L. Kersten,et al.  The researcher's guide to the data deluge , 2011, Proc. VLDB Endow..

[112]  V. de Boer,et al.  Ontology enrichment from heterogeneous sources on the web , 2010 .

[113]  Adriana Birlutiu,et al.  Machine learning for pairwise data : applications for preference learning and supervised network inference , 2011 .

[114]  Jennifer Widom,et al.  STREAM: The Stanford Stream Data Manager , 2003, IEEE Data Eng. Bull..

[115]  Rajeev Rastogi,et al.  Processing complex aggregate queries over data streams , 2002, SIGMOD '02.

[116]  E. G. Boltjes,et al.  Voorbeeldig onderwijs : voorbeeldgestuurd onderwijs, een opstap naar abstract denken, vooral voor meisjes , 2004 .

[117]  Kapali P. Eswaran Aspects of a trigger subsystem in an integrated database system , 1976, ICSE '76.

[118]  Marijn Huijbregts,et al.  Segmentation, diarization and speech transcription : surprise data unraveled , 2008 .

[119]  Bart Willem Schermer,et al.  Software Agents, Surveillance and the right to privacy , 2007 .

[120]  I. van de Weerd,et al.  Advancing in software product management: An incremental method engineering approach , 2009 .

[121]  Laura Hollink,et al.  Semantic annotation for retrieval of visual resources , 2006 .

[122]  David Smits,et al.  Towards a generic distributed adaptive hypermedia environment , 2012 .

[123]  Wolfgang Nejdl,et al.  Smart Space for Learning: A Mediation Infrastructure for Learning Services , 2003 .

[124]  V. Bessa Machado Supporting the Construction of Qualitative Knowledge models , 2004 .

[125]  Richard Winter,et al.  Large scale data warehousing: Trends and observations , 2010, ICDE.

[126]  James L. Peterson,et al.  Petri Nets , 1977, CSUR.

[127]  Manolis Koubarakis,et al.  Semantic Grid Resource Discovery using DHTs in Atlas , 2006 .

[128]  Juan Roberto Castelo Valdueza,et al.  The Discrete Acyclic Digraph Markov Model in Data Mining , 2002 .

[129]  Virginia N. L. Franqueira,et al.  Finding multi-step attacks in computer networks using heuristic search and mobile ambients , 2009 .

[130]  Stefan Manegold,et al.  Understanding, modeling, and improving main-memory database performance , 2002 .

[131]  Michael Stonebraker,et al.  The Aurora and Medusa Projects , 2003, IEEE Data Eng. Bull..

[132]  Navendu Jain,et al.  Design, implementation, and evaluation of the linear road bnchmark on the stream processing core , 2006, SIGMOD Conference.

[133]  Johan van den Akker,et al.  DEGAS: an active, temporal database of autonomous objects , 1998 .

[134]  Ronald Poppe,et al.  Discriminative vision-based recovery and recognition of human motion , 2009 .

[135]  Koen V. Hindriks,et al.  Agent programming languages: programming with mental models , 2001 .

[136]  Eric N. Hanson,et al.  The Design and Implementation of the Ariel Active Database Rule System , 1996, IEEE Trans. Knowl. Data Eng..

[137]  L. H. Christoph The role of metacognitive skills in learning to solve problems , 2006 .

[138]  Chen Li,et al.  Mining Process Model Variants: Challenges, Techniques, Examples , 2010 .

[139]  Mark ter Maat,et al.  Response Selection and Turn-taking for a Sensitive Artificial Listening Agent , 2011 .

[140]  van Boris Wessel Schooten,et al.  Development and Specification of Virtual Environments , 2003 .

[141]  A. J. Hommersom,et al.  On the Application of Formal Methods to Clinical Guidelines, an Artificial Intelligence Perspective , 2008 .

[142]  M. Hiel,et al.  An adaptive service oriented architecture : Automatically solving interoperability problems , 2010 .

[143]  Dennis Shasha,et al.  StatStream: Statistical Monitoring of Thousands of Data Streams in Real Time , 2002, VLDB.

[144]  G. Folino High performance data mining using bio-inspired techniques , 2010 .

[145]  Martin L. Kersten,et al.  Flattening an object algebra to provide performance , 1998, Proceedings 14th International Conference on Data Engineering.

[146]  M. Tsagkias,et al.  Mining social media: tracking content and predicting behavior , 2012 .

[147]  Jennifer Widom,et al.  StreaMon: an adaptive engine for stream query processing , 2004, SIGMOD '04.

[148]  Zulfiqar Ali Memon Designing human-awareness for ambient agents: A human mindreading perspective , 2010, J. Ambient Intell. Smart Environ..

[149]  Fernando Luiz Koch,et al.  An Agent-Based Model for the Development of Intelligent Mobile Services , 2009 .

[150]  Manolis Koubarakis,et al.  Publish/Subscribe with RDF Data over Large Structured Overlay Networks , 2005, DBISP2P.

[151]  Stratos Idreos,et al.  dbTouch: Analytics at your Fingertips , 2013, CIDR.

[152]  E. Ydraios Database cracking: towards auto-tunning database kernels , 2010 .

[153]  Alina Pommeranz,et al.  Designing Human-Centered Systems for Reflective Decision Making , 2012 .

[154]  A. de Moor,et al.  Empowering communities: A method for the legitimate user-driven specification of network information systems , 1999 .

[155]  Peter Scheuermann,et al.  Active Database Systems , 2008, Wiley Encyclopedia of Computer Science and Engineering.

[156]  Joseph M. Hellerstein,et al.  Eddies: continuously adaptive query processing , 2000, SIGMOD '00.

[157]  Tore Risch,et al.  EDUTELLA: a P2P networking infrastructure based on RDF , 2002, WWW.

[158]  Martin L. Kersten,et al.  DataCell: Building a Data Stream Engine on top of a Relational Database Kernel , 2009, VLDB PhD Workshop.

[159]  Jaap Gordijn,et al.  Value-based requirements engineering: exploring innovative e-commerce ideas , 2003, Requirements Engineering.

[160]  Stan Zdonik,et al.  Load Shedding Techniques for Data Stream Management Systems , 2007 .

[161]  Hector Garcia-Molina,et al.  Scheduling Real-Time Transactions with Disk Resident Data , 1989, VLDB.

[162]  Edmund L. Gettier Is Justified True Belief Knowledge? , 1963, Arguing About Knowledge.

[163]  Martijn van Otterlo,et al.  The logic of adaptive behavior : knowledge representation and algorithms for the Markov decision process framework in first-order domains , 2008 .

[164]  R. M. van Lambalgen,et al.  When the Going Gets Tough: Exploring Agent-based Models of Human Performance under Demanding Conditions , 2012 .

[165]  Peter Van Rosmalen,et al.  Supporting the tutor in the design and support of adaptive e-learning , 2008 .

[166]  Manolis Koubarakis,et al.  Continuous RDF Query Processing over DHTs , 2007, ISWC/ASWC.

[167]  Ryan Newton,et al.  The Case for a Signal-Oriented Data Stream Management System , 2007, CIDR.

[168]  Jan Broersen Modal Action Logics for Reasoning about Reactive Systems , 2003 .

[169]  W. Teepe Reconciling Information Exchange and Confidentiality, A Formal Approach , 2007 .

[170]  S. A. Raaijmakers,et al.  Multinomial Language Learning: Investigations into the Geometry of Language , 2009 .

[171]  Sander Evers,et al.  Sensor data management with probabilistic models , 2009 .

[172]  Jeffrey F. Naughton,et al.  Evaluating window joins over unbounded streams , 2003, Proceedings 19th International Conference on Data Engineering (Cat. No.03CH37405).

[173]  Michael Stonebraker,et al.  Aurora: a data stream management system , 2003, SIGMOD '03.

[174]  Michael Stonebraker,et al.  Monitoring Streams - A New Class of Data Management Applications , 2002, VLDB.

[175]  Lai Xu Monitoring multi-party contracts for E-business , 2004 .

[176]  Frederick Reiss,et al.  TelegraphCQ: Continuous Dataflow Processing for an Uncertain World , 2003, CIDR.

[177]  Jurriaan van Diggelen,et al.  Achieving semantic interoperability in multi-agent systems: A dialogue-based approach , 2007 .

[178]  Martin L. Kersten,et al.  MonetDB/DataCell: Online Analytics in a Streaming Column-Store , 2012, Proc. VLDB Endow..

[179]  Henk Ernst Blok Database Optimization Aspects for Information Retrieval , 2002 .

[180]  Wouter Immánuël Koelewijn Privacy en politiegegevens. Over geautomatiseerde normatieve informatie-uitwisseling , 2009 .

[181]  Theodore Johnson,et al.  Gigascope: a stream database for network applications , 2003, SIGMOD '03.