A probabilistic data model and algebra for location-based data warehouses and their implementation

This paper proposes a novel, probabilistic data model and algebra that improves the modeling and querying of uncertain data in spatial OLAP (SOLAP) to support location-based services. Data warehouses that support location-based services need to combine complex hierarchies, such as road networks or transportation infrastructures, with static and dynamic content, e.g., speed limits and vehicle positions, respectively. Both the hierarchies and the content are often uncertain in real-world applications. Our model supports the use of probability distributions within both facts and dimensions. We give an algebra that correctly aggregates uncertain data over uncertain hierarchies. This paper also describes an implementation of the model and algebra, gives a complexity analysis of the algebra, and reports on an empirical, experimental evaluation of the implementation. The work is motivated with a real-world case study, based on our collaboration with a leading Danish vendor of location-based services.

[1]  Anthony C. Klug Equivalence of Relational Algebra and Relational Calculus Query Languages Having Aggregate Functions , 1982, JACM.

[2]  T. S. Jayram,et al.  OLAP over uncertain and imprecise data , 2007, The VLDB Journal.

[3]  Esteban Zimányi,et al.  Advanced Data Warehouse Design: From Conventional to Spatial and Temporal Applications , 2010 .

[4]  Ralph Kimball,et al.  The Data Warehouse Lifecycle Toolkit , 2009 .

[5]  Jeffrey Scott Vitter,et al.  Efficient join processing over uncertain data , 2006, CIKM '06.

[6]  Sandro Bimonte,et al.  When Spatial Analysis Meets OLAP: Multidimensional Model and Operators , 2010, Int. J. Data Warehous. Min..

[7]  Ronald L. Rivest,et al.  Introduction to Algorithms , 1990 .

[8]  Serge Abiteboul,et al.  On the representation and querying of sets of possible worlds , 1987, SIGMOD '87.

[9]  K S Opiela A GENERIC DATA MODEL FOR LINEAR REFERENCING SYSTEMS , 1997 .

[10]  PedersenTorben Bach,et al.  Multidimensional data modeling for location-based services , 2004, VLDB 2004.

[11]  Ouri Wolfson,et al.  The Geometry of Uncertainty in Moving Objects Databases , 2002, EDBT.

[12]  Bart Kuijpers,et al.  Spatial aggregation: Data model and implementation , 2007, Inf. Syst..

[13]  Erol Gelenbe,et al.  A probability model of uncertainty in data bases , 1986, 1986 IEEE Second International Conference on Data Engineering.

[14]  Nectaria Tryfona,et al.  Pre-aggregation in Spatial Data Warehouses , 2001, SSTD.

[15]  Thomas Brinkhoff,et al.  A Framework for Generating Network-Based Moving Objects , 2002, GeoInformatica.

[16]  Markus Schneider,et al.  OLAP Formulations for Supporting Complex Spatial Objects in Data Warehouses , 2011, DaWaK.

[17]  Esteban Zimányi,et al.  What Is Spatio-Temporal Data Warehousing? , 2009, DaWaK.

[18]  Michela Bertolotto,et al.  Integrating Google Earth within OLAP Tools for Multidimensional Exploration and Analysis of Spatial Data , 2009, ICEIS.

[19]  Euro Beinat,et al.  Pro Oracle Spatial , 2004, Apress.

[20]  Ronald L. Rivest,et al.  Introduction to Algorithms, Second Edition , 2001 .

[21]  Jeffrey Considine,et al.  Spatio-temporal aggregation using sketches , 2004, Proceedings. 20th International Conference on Data Engineering.

[22]  Goce Trajcevski,et al.  Probabilistic range queries in moving objects databases with uncertainty , 2003, MobiDe '03.

[23]  Maribel Yasmina Santos,et al.  Spatial Clustering to Uncluttering Map Visualization in SOLAP , 2011, ICCSA.

[24]  Valéria Cesário Times,et al.  Querying Geographical Data Warehouses With GeoMDQL , 2007, SBBD.

[25]  Sivakumar Harinath,et al.  Professional SQL Server Analysis Services 2005 with MDX , 2006 .

[26]  Raghu Ramakrishnan,et al.  OLAP over Imprecise Data with Domain Constraints , 2007, VLDB.

[27]  Michael Pittarelli,et al.  The Theory of Probabilistic Databases , 1987, VLDB.

[28]  Bart Kuijpers,et al.  A data model and query language for spatio-temporal decision support , 2011, GeoInformatica.

[29]  John Grant,et al.  Incomplete Information in a Relational Database , 1980, Fundamenta Informaticae.

[30]  Patrick Bosc,et al.  Fuzzy querying with SQL: extensions and implementation aspects , 1988 .

[31]  Esteban Zimányi,et al.  Logical Representation of a Conceptual Model for Spatial Data Warehouses , 2007, GeoInformatica.

[32]  Christian S. Jensen,et al.  Indexing the positions of continuously moving objects , 2000, SIGMOD '00.

[33]  Elaheh Pourabbas,et al.  Cooperation with Geographic Databases , 2003, Multidimensional Databases.

[34]  C. J. Date Null Values in Database Management , 1982, BNCOD.

[35]  Torben Bach Pedersen,et al.  Probabilistic Data Modeling and Querying for Location-Based Data Warehouses , 2005, SSDBM.

[36]  Leticia I. Gómez,et al.  A generic data model and query language for spatiotemporal OLAP cube analysis , 2012, EDBT '12.

[37]  Dimitrios Gunopulos,et al.  Temporal and spatio-temporal aggregations over data streams using multiple time granularities , 2003, Inf. Syst..

[38]  Dimitrios Gunopulos,et al.  Efficient aggregation over objects with extent , 2002, PODS '02.

[39]  Walter L. Smith Probability and Statistics , 1959, Nature.

[40]  Bart Kuijpers,et al.  Piet: a GIS-OLAP implementation , 2007, DOLAP '07.

[41]  Ralph Kimball,et al.  The Data Warehouse Lifecycle Toolkit: Expert Methods for Designing, Developing and Deploying Data Warehouses with CD Rom , 1998 .

[42]  Valéria Cesário Times,et al.  An open source and web based framework for geographic and multidimensional processing , 2006, SAC '06.

[43]  Yufei Tao,et al.  Query Processing in Spatial Network Databases , 2003, VLDB.

[44]  Christian S. Jensen,et al.  Enabling Location-based Services—Multi-Graph Representation of Transportation Networks , 2008, GeoInformatica.

[45]  Hector Garcia-Molina,et al.  The Management of Probabilistic Data , 1992, IEEE Trans. Knowl. Data Eng..

[46]  Clifford Stein,et al.  Introduction to Algorithms, 2nd edition. , 2001 .

[47]  Torben Bach Pedersen,et al.  Incomplete Information in Multidimensional Databases , 2003, Multidimensional Databases.

[48]  Christian S. Jensen,et al.  A foundation for capturing and querying complex multidimensional data , 2001, Inf. Syst..

[49]  V. S. Subrahmanian,et al.  A Logical Formulation of Probabilistic Spatial Databases , 2007, IEEE Transactions on Knowledge and Data Engineering.

[50]  Jimeng Sun,et al.  Querying about the past, the present, and the future in spatio-temporal databases , 2004, Proceedings. 20th International Conference on Data Engineering.

[51]  Sunil Prabhakar,et al.  Evaluating probabilistic queries over imprecise data , 2003, SIGMOD '03.

[52]  Torben Bach Pedersen,et al.  Integrated Data Management for Mobile Services in the Real World , 2003, VLDB.

[53]  Dan Suciu,et al.  Efficient query evaluation on probabilistic databases , 2004, The VLDB journal.

[54]  Curtis E. Dyreson,et al.  Information Retrieval from an Incomplete Data Cube , 1996, VLDB.

[55]  Jérôme Gensel,et al.  Spatial OLAP and Map Generalization: Model and Algebra , 2012, Int. J. Data Warehous. Min..

[56]  Jennifer Widom,et al.  Databases with uncertainty and lineage , 2008, The VLDB Journal.

[57]  Torben Bach Pedersen,et al.  Capturing complex multidimensional data in location-based data warehouses , 2004, GIS '04.

[58]  Curtis E. Dyreson,et al.  A Bibliography on Uncertainty Management in Information Systems , 1996, Uncertainty Management in Information Systems.

[59]  M. Jarke,et al.  Fundamentals of Data Warehouses , 2003, Springer Berlin Heidelberg.

[60]  Markus Schneider,et al.  A foundation for representing and querying moving objects , 2000, TODS.

[61]  Ralf Hartmut Güting,et al.  A generic data model for moving objects , 2012, GeoInformatica.

[62]  Sunil Prabhakar,et al.  Querying imprecise data in moving object environments , 2003, IEEE Transactions on Knowledge and Data Engineering.

[63]  Torben Bach Pedersen,et al.  Multidimensional data modeling for location-based services , 2002, GIS '02.

[64]  Curtis E. Dyreson,et al.  Building a display of missing information in a data sieve , 2011, DOLAP '11.