Multimodal Event Processing: A Neural-Symbolic Paradigm for the Internet of Multimedia Things
暂无分享,去创建一个
With the Internet of Multimedia Things (IoMT) becoming a reality, new approaches are needed to process real-time multimodal event streams. Existing approaches to event processing have limited consideration for the challenges of multimodal events, including the need for complex content extraction, and increased computational and memory costs. This article explores event processing as a basis for processing real-time IoMT data. This article introduces the multimodal event processing (MEP) paradigm, which provides a formal basis for native approaches to neural multimodal content analysis (i.e., computer vision, linguistics, and audio) with symbolic event processing rules to support real-time queries over multimodal data streams using the multimodal event processing language to express single, primitive multimodal, and complex multimodal event patterns. The content of multimodal streams is represented using multimodal event knowledge graphs to capture the semantic, spatial, and temporal content of the multimodal streams. The approach is implemented and evaluated within a MEP engine using single and multimodal queries achieving near real-time performance with a throughput of ~30 frames processed per second (fps) and subsecond latency of 0.075–0.30 s for video streams of 30 fps input rate. Support for high input stream rates (45 fps) is achieved through content-aware load-shedding techniques with a ~127X latency improvement resulting in only a minor decrease in accuracy.