Real-Time Route 66: Linking External Data Sources

The time budget for streaming data can be on a millisecond scale. Regardless of latency requirements, the first step is invariably transporting the data to a processing platform while perhaps traversing the entire Internet. Any pipelined architecture can only be as fast as its slowest link. For this reason, even before the data has landed in the data center, the choice of the transport solution—even though technically it is not part of your application—can substantially affect performance. With this in mind, this chapter is dedicated to ingesting data from solutions such as Kafka, Flume, and MQTT. In the process, you also write your own connector for HTTP to learn the ropes of connecting to external data sources.