Platform and applications for massive-scale streaming network analytics

The ability to analyze massive amounts of network traffic data in real time is becoming increasingly important for communication service providers, as it enables them to optimize use of their service infrastructure and develop innovative revenue-generating opportunities. In particular, the real-time analysis of perishable user traffic (which is not stored because of privacy, regulatory, and other constraints) can provide insights into the use of applications and services by telecommunication subscribers. In this paper, we describe the design and implementation of a novel system for real-time analysis of network traffic based on IBM InfoSphere® Streams, a scalable stream-processing platform, which provides access and analysis with respect to the data objects and communication patterns of users at the application layer, in contrast to simple packet-and flow-based analysis that most current systems provide. We discuss our design considerations for such a system and further describe analytics applications developed to showcase its capabilities: online identification of most-frequent objects, online social network discovery, and real-time sentiment analysis. We also present performance results from a pilot deployment of this platform and its applications that analyzed Internet traffic generated by users at a large corporate research lab.