CarStream: An Industrial System of Big Data Processing for Internet-of-Vehicles

As the Internet-of-Vehicles (IoV) technology becomes an increasingly important trend for future transportation, designing large-scale IoV systems has become a critical task that aims to process big data uploaded by fleet vehicles and to provide data-driven services. The IoV data, especially high-frequency vehicle statuses (e.g., location, engine parameters), are characterized as large volume with a low density of value and low data quality. Such characteristics pose challenges for developing real-time applications based on such data. In this paper, we address the challenges in designing a scalable IoV system by describing CarStream, an industrial system of big data processing for chauffeured car services. Connected with over 30,000 vehicles, CarStream collects and processes multiple types of driving data including vehicle status, driver activity, and passenger-trip information. Multiple services are provided based on the collected data. CarStream has been deployed and maintained for three years in industrial usage, collecting over 40 terabytes of driving data. This paper shares our experiences on designing CarStream based on large-scale driving-data streams, and the lessons learned from the process of addressing the challenges in designing and maintaining CarStream.

[1]  Brad Fitzpatrick,et al.  Distributed caching with memcached , 2004 .

[2]  Leonardo Neumeyer,et al.  S4: Distributed Stream Computing Platform , 2010, 2010 IEEE International Conference on Data Mining Workshops.

[3]  Michael Stonebraker,et al.  The 8 requirements of real-time stream processing , 2005, SGMD.

[4]  Ying Xing,et al.  Scalable Distributed Stream Processing , 2003, CIDR.

[5]  Dmitry Namiot,et al.  On Big Data Stream Processing , 2015 .

[6]  Qiang Chen,et al.  Aurora : a new model and architecture for data stream management ) , 2006 .

[7]  Jennifer Widom,et al.  STREAM: The Stanford Data Stream Management System , 2016, Data Stream Management.

[8]  Haifeng Jiang,et al.  Photon: fault-tolerant and scalable joining of continuous data streams , 2013, SIGMOD '13.

[9]  Joel H. Saltz,et al.  Hadoop-GIS: A High Performance Spatial Data Warehousing System over MapReduce , 2013, Proc. VLDB Endow..

[10]  Jun Yang,et al.  A Survey of Join Processing in Data Streams , 2007, Data Streams - Models and Algorithms.

[11]  Samuel Madden,et al.  TrajStore: An adaptive storage system for very large trajectory data sets , 2010, 2010 IEEE 26th International Conference on Data Engineering (ICDE 2010).

[12]  Jimmy J. Lin,et al.  Summingbird: A Framework for Integrating Batch and Online MapReduce Computations , 2014, Proc. VLDB Endow..

[13]  Craig Chambers,et al.  The Dataflow Model: A Practical Approach to Balancing Correctness, Latency, and Cost in Massive-Scale, Unbounded, Out-of-Order Data Processing , 2015, Proc. VLDB Endow..

[14]  Scott Shenker,et al.  Discretized streams: fault-tolerant streaming computation at scale , 2013, SOSP.

[15]  Komal Shringare,et al.  Apache Hadoop Goes Realtime at Facebook , 2015 .

[16]  Jignesh M. Patel,et al.  Twitter Heron: Stream Processing at Scale , 2015, SIGMOD Conference.

[17]  Rajkumar Buyya,et al.  A survey on vehicular cloud computing , 2014, J. Netw. Comput. Appl..

[18]  Xiaohui Gu,et al.  PerfCompass: Toward Runtime Performance Anomaly Fault Localization for Infrastructure-as-a-Service Clouds , 2014, HotCloud.

[19]  Qiang Fu,et al.  Performance Issue Diagnosis for Online Service Systems , 2012, 2012 IEEE 31st Symposium on Reliable Distributed Systems.

[20]  Anshul Jaiswal,et al.  Realtime Data Processing at Facebook , 2016, SIGMOD Conference.

[21]  Ricardo Fernandes,et al.  TrafficDB: HERE's High Performance Shared-Memory Data Store , 2016, Proc. VLDB Endow..

[22]  Walid G. Aref,et al.  LocationSpark: A Distributed In-Memory Data Management System for Big Spatial Data , 2016, Proc. VLDB Endow..

[23]  Mario Gerla,et al.  Vehicular Cloud Computing , 2012, 2012 The 11th Annual Mediterranean Ad Hoc Networking Workshop (Med-Hoc-Net).

[24]  David Josephsen,et al.  Building a Monitoring Infrastructure with Nagios , 2007 .

[25]  Rajiv Ranjan,et al.  Streaming Big Data Processing in Datacenter Clouds , 2014, IEEE Cloud Computing.

[26]  Philip S. Yu,et al.  Executing Stream Joins on the Cell Processor , 2007, VLDB.

[27]  Josiah L. Carlson,et al.  Redis in Action , 2013 .