org.thales.punch.libraries.storm.spout (Punch Storm Spouts and Bolts 6.4.4 API)

Class Summary
Class	Description
AbstractFileInput	The file spout feeds messages into Storm from one or several files.
AbstractFileInput.Item	Whenever something is read, it is enqueued so that stom can come pick it in nextTuple().
AbstractKafkaNode	Common kafka input node base class with standard kafka properties definition.
AbstractSocketInput<T>	Base class for socket servers spouts : http, lumberjack relp and plain syslog.
AzureBlobStorageSpout	The AzureBlobStorageSpout enable you to pull data from a given container located in an Azure Blob Storage.
ExtractionInput	Light topology elastic input node designed for extraction
GeneratorInput	The generator publishes fake data in configurable streams.
HttpInput	The Http spout receives http requests and forwards them as tuple.
KafkaInput	The KafkaInput node reads records from a topic and forwards them into a punchline.
LumberjackInput	The Lumberjack spout receives and decodes lumberjack frames, emitted in turn as tuples.
RelpInput	The Relp listening spout.
SFTPSpout	SFTPSpout downloads files that matches a certain regex expression.
SmtpInput	The SMTP listening spout.
SnmpInput	The SNMP spout is able to receive SNMP Traps over UDP It decodes the SNMP Traps messages into a JSON format - cf.
SyslogInput<T>	The Syslog input reads lines from tcp/udp and emits them as tuples.
WrapperCustomInput	Wrapper input node to convert custom node with public API to legacy node with private API

Package org.thales.punch.libraries.storm.spout Description

Punchplatform spouts package.

Overview

The punchplatform spouts are standard storm spouts. It is easy to code your own and make them available to punch topologies. The few key concepts regarding the way the data is forwarded as streams and fields is described here. Refer to each spout javadoc page for details.

Data, Streams and Fields

User Data

Data comes in to a spout from an external source : socket, kafka, files. In some case you receive a single line (i.e. a string), in some other case you receive a map of key value elements.

In all case what the spout does is to take these value(s) and forward them as part of a storm stream, under the form of a so called tuple which actually is a key-value map.

It is your job to configure the spout to emit the fields you want as part of tuples inside streams. Punchplatform topology files allows you to design arbitrary DAGs. You really can invent the way your data is transported, processed and routed to one or several final destinations such as Elasticearch or an archiving backend.

System Data

You can also configure your spout to take care two additional reserved punchplatform streams whose names and semantics have a special meaning.

the "_ppf_metrics" stream : most spout can be configured to emit a latency tracker tuple so as to collect the traversal time of the data in one or several topologies. If you use that features, that is the stream to use.
the "_ppf_errors" : only the punch bolt can generate tuples over that stream. The data actually contains error information whenever a punchlet throws an exception.

If you run production it is likely you need to properly leverage these two great punchplatform features. Your goal is to define the path so that both latency metrics and punch error travel up to their final destination.

Example

Here is an example Spout configuration. What you see here is that spout is configured to :

forward the user data as tuples with a single field log inside the stream "logs"
forward and generate latency tracker tuples (stream : _ppf_metrics, tuple : [ "_ppf_latency" ]).
forward the punch error received (stream : _ppf_errors, tuple : [ "_ppf_error" ])


 "spouts" : [
  {
   "type" : "one_of_the_spout",
   "spout_settings" : { ... },
   "storm_settings" : {
    "publish" : [
     { "stream" : "logs", "fields" : ["log"] },
     { "stream" : "_ppf_metrics", "fields" : ["_ppf_latency"] },
     { "stream" : "_ppf_errors", "fields" : ["_ppf_error"] }
    ]
   }
  }

If instead you configure it like this :


 "spouts" : [
  {
   "type" : "one_of_the_spout",
   "spout_settings" : { ... },
   "storm_settings" : {
    "publish" : [
     { "stream" : "logs", "fields" : ["log"] }
    ]
   }
  }

Only the user data will be forwarded as tuple in your topology.