Solving the Fragment Complexity of Official, Social, and Sensorial Urban Data

Cities in the big data era hold the massive urban data to create valuable information and digitally enhanced services. Sources of urban data are generally categorized as one of the three types: official, social, and sensorial, which are from the government and enterprises, social networks of citizens, and the sensor network. These types typically differ significantly from each other but are consolidated together for the smart urban services. Based on the sophisticated consolidation approaches, we argue that a new challenge, fragment complexity that represents a well-integrated data has appropriate but fragmentary schema and difficult to be queried, is ignored in the state-of-art urban data management. Comparing with predefined and rigid schema, fragmentary schema means a dataset contains millions of attributes but nonorthogonally distributed among tables, and of course, values of these attributes are even massive. As far as a query is concerned, locating where these attributes are being stored is the first encountered problem, while traditional value-based query optimization has no contributions. To address this problem, we propose an index on massive attributes as an attributes-oriented optimization, namely, attribute index. Attribute index is a secondary index for locating files in which the target attributes are stored. It contains three parts: ATree for searching keys, DTree for locating keys among files, and ADLinks as a mapping table between ATree and DTree. In this paper, the index architecture, logical structure and algorithms, the implementation details, the creation process, the integration to the existing key-value store, and the urban application scenario are described. Experiments show that, in comparison with B + -Tree, LSM-Tree, and AVL-Tree, the query time of ATree is 1.1x, 1.5x, and 1.2x faster, respectively. Finally, we integrate our proposition with HBase, namely, UrbanBase, whose query performance is 1.3x faster than the original HBase.

[1]  Yuqing Zhu,et al.  BiloKey : A Scalable Bi-Index Locality-Aware In-Memory Key-Value Store , 2019, IEEE Transactions on Parallel and Distributed Systems.

[2]  Jun Yang,et al.  Simulating Intraurban Land Use Dynamics under Multiple Scenarios Based on Fuzzy Cellular Automata: A Case Study of Jinzhou District, Dalian , 2018, Complex..

[3]  Zhiyuan Tan,et al.  Urban data management system: Towards Big Data analytics for Internet of Things based smart urban environment using customized Hadoop , 2019, Future Gener. Comput. Syst..

[4]  Advanced parallel and distributed computing for big urban data , 2020, The Journal of Supercomputing.

[5]  Yanping Chen,et al.  Optimization of Planning Layout of Urban Building Based on Improved Logit and PSO Algorithms , 2018, Complex..

[6]  Renzhou Gui,et al.  In-Depth Analysis of Railway and Company Evolution of Yangtze River Delta with Deep Learning , 2020, Complex..

[7]  Cláudio T. Silva,et al.  Visual Exploration of Big Spatio-Temporal Urban Data: A Study of New York City Taxi Trips , 2013, IEEE Transactions on Visualization and Computer Graphics.

[8]  Nazareno Andrade,et al.  An Integrated Big and Fast Data Analytics Platform for Smart Urban Transportation Management , 2019, IEEE Access.

[9]  Azzedine Boukerche,et al.  Crowd Management: A New Challenge for Urban Big Data Analytics , 2019, IEEE Communications Magazine.

[10]  Xin Gao,et al.  Multi-dimensional Index over a Key-Value Store for Semi-structured Data , 2018, BigSDM.

[11]  Yu Tang,et al.  SPKV: A Multi-dimensional Index System for Large Scale Key-Value Stores , 2014, APWeb.

[12]  Beng Chin Ooi,et al.  Efficient B-tree based indexing for cloud data processing , 2010, Proc. VLDB Endow..

[13]  Emanuele Della Valle,et al.  Models and Practices in Urban Data Science at Scale , 2019, Big Data Research.

[14]  P. Thakuriah,et al.  Big Data and Urban Informatics: Innovations and Challenges to Urban Planning and Knowledge Discovery , 2017 .

[15]  Pablo Sotres,et al.  Practical Lessons From the Deployment and Management of a Smart City Internet-of-Things Infrastructure: The SmartSantander Testbed Case , 2017, IEEE Access.

[16]  Dedao Gu,et al.  Urban Big Data and the Development of City Intelligence , 2016 .

[17]  Aoying Zhou,et al.  Join Optimization in the MapReduce Environment for Column-wise Data Store , 2010, 2010 Sixth International Conference on Semantics, Knowledge and Grids.

[18]  Jia Liu,et al.  Urban big data fusion based on deep learning: An overview , 2020, Inf. Fusion.

[19]  Bin Cheng,et al.  Building a Big Data Platform for Smart Cities: Experience and Lessons from Santander , 2015, 2015 IEEE International Congress on Big Data.

[20]  Bettina Kemme,et al.  Secondary Indexing Techniques for Key-Value Stores: Two Rings To Rule Them All , 2017, EDBT/ICDT Workshops.