论文信息 - Design and Implementation of Real-Time Video Big Data Platform based on Spark Streaming

Design and Implementation of Real-Time Video Big Data Platform based on Spark Streaming

Video data has gradually become the important component of big data from the monitoring networks, intelligent transportation networks, smart cities or other fields. In this paper, we have used the producer-consumer model of Kafka as the video streaming data acquisition layer, and the real-time processing framework Spark Streaming combined with openCV as data processing layer, and Memory, HDFS or HBase as data storage layer, and Web Technology to display the final results. During transmission process, we have used B-ASE64 encoding and JSON to implements data conversion. Therefore, the framework was designed to achieve data real-time acquisition, data processing, data transcoding, data storage and display. Meanwhile, the face recognition video capture was used as an example to build and test this framework. The relationship among the numbers of Worker nodes, batch slice and the processing efficiency was tested. The test results showed that the framework was feasible and had good real-time processing ability. And the better performance could be achieved by adjusting usage rate of CPU. This research has value for the real-time processing of image recognition, image retrieval and other big data application. But if data is deep mined, deep learning such as CNN will be needed to apply. Recently, SparkNet, a new component of Spark, has provided support for the neural network. SparkNet has made efforts to change the traditional time-consuming training for deep learning to improve greatly the efficiency.

Yao Li | Liheng Zhao | Fuqiang Luo | Hongjun Chen

[1] Tao Wang,et al. Deep learning with COTS HPC systems , 2013, ICML.

[2] Michael I. Jordan,et al. SparkNet: Training Deep Networks in Spark , 2015, ICLR.