The WhiteBoard bolt publishes a message in a white board subscribed by a consumer (HttpSpout). It is a terminating bolt, i.e. it does not forwards the tuple further the topology.
This bolt uses a WhiteBoard to save all subscriptions. This WhiteBoard is static and shared between Executors (threads in Storm). A subscription is identified by a unique id and embeds a callback. This id is used by a consumer (HttpSpout) to subscribe it and by WhiteBoardBolt to produce into it.
Typically an HttpSpout plays the consumer role: when it receives data it processes it (add a timestamp and a unique id), subscribes to a queue identified by this unique id and forwards the tuple further the topology. The tuple may be processed several times in different topologies. Then, when this WhiteBoard bolt receives it, it publishes this processed tuple into the adequate queue (according to the unique id). Finally the callback attached to the subscription is called and the HttpSpout sends back the processed tuple as a response to the HTTP client. If there is no subscription attached to the id when the bolt receives the tuple, nothing happens (no callback, no exception), tuple is ignored.
Because of the necessity to share the whiteboard between HttpSpout and WhiteBoardBolt, these two components have to run on the same worker.
For the moment connection failures are not managed.
There is one new subscription for each new http data received. It has to be improved: one subscription per connection.
Streams And Fields¶
The field used to identify subscriptions is the storm field es_id, and the data produced is the field log, so it has to be explicitly defined in the topology configuration file:
"fields" : ["log", "es_id"]
This is illustrated next: