What is a Channel and a Topology ?

The release comes in with sample channel that parses various types of logs. You can start from this one to write your own.

A channel is simply a sequence of functions to do to get your things done. In our demo case, a channel is very simple, we’ve named it : “single” channel. It:

  • receives the log,
  • parses it,
  • stores it into Elasticsearch.

You can however build more complicated channels by: - adding Kafka queuing to secure the data, - adding a batch processing stage to make machine learning, - Send it elsewhere: archiving, other system, in a database etc.

For each processing stage, we will design a process to handle the task. It is a Topology. In the demo case, we have set up one only topology to treat the three first points. In a more complex environment, we can dissociate it by making one topology for reception and Kafka store, one for log parsing, one for insertion. All of them bundled in one channel.

To sum up:

  • A channel handles end-to-end concern for a functionality, that you can start and stop.
  • A channel is made of topologies, but also other stuffs (Kafka topics, batch processing, alerting…),
  • Topologies run in Storm engine.

Channels and PunchPlatform templating

The channel and the topologies configuration files are stored in a channel directory, in $PUNCHPLATFORM_CONF_DIR/tenants/<tenant_name>/channels/<channel_name>/.

You got several files in there, which may seem unobvious. Moreover, lots of channels look the same and follow a certain scheme. That’s the reason of PunchPlatform templating.

the PunchPlatform Standalone is shipped with the single channel template, and the four channels available are just implementations of it. Check out the content of the file:

$PUNCHPLATFORM_CONF_DIR/tenants/mytenant/configurations_for_channels_generation/lmc/apache_httpd_channel.json

You have:

  • the channel and tenant name,
  • the channel template reference,
{
  "tenant" : "mytenant",
  "channel" : "apache",
  "channel_structure_profile" : "lmc/single"
}
  • some arguments that makes it specific (parsers, input port etc.)

Then you can compile your channel simply with the command:

$ cd $PUNCHPLATFORM_CONF_DIR
$ punchplatform-channel.sh --configure tenants/mytenant/configurations_for_channels_generation/lmc/apache_httpd_channel.json

Check out the result in the Apache HTTPd channel.

The point of the punchplatform-channel.sh –start command is simply to setup everything for that journey to work: create a Kafka topic with a given number of partition and replication factor, start the Storm topology.

Before leaving, take a look at the generated files, especially $PUNCHPLATFORM_CONF_DIR/tenants/mytenant/channels/apache_httpd/channel_structure.json. You have here:

  • The Storm Topologies to be launched,
  • The Kafka Topics to be created,
  • The end-to-end probes to check that everything is okay (giving the green square in the PunchPlatform Admin Service).

Rationale

The PunchPlatform lets you leverage on great technologies Storm, Kafka, Elasticsearch, Kibana, Grafana (etc..) in ways you can invent.

Should you do this on your own, dealing with all the pieces required to setup a complete processing chain (a channel) is heavy work: configure all the components, start/stop/restart all the pieces, monitor what’s going on, etc.. The PunchPlatform will make you gain invaluable effort and money to setup a scalable, resilient platform for processing all sort of data, including logs.

This said, the PunchPlatform rationale is not to hide these technologies, you should master them and understand how they work. The PunchPlatform abstractions are lightweight and simple to understand.

Take for example the start of a channel. To have a quick view of the various steps required to start a channel, try the –show option of the punchplatform-channels.sh. It shows you what would be executed to start|stop a channel.

> punchplatform-channel.sh --show --start mytenant/apache_httpd

You will see that the required actions are one of the followings:

  • punchplatform-zookeeper.sh : create some zookeeper roots, if needed. * punchplatform-kafka-topics.sh : create some topics as required for your channel. * punchplatform-channel.sh : start one or several storm topology. This may occur several times if you have a multi topology setup (such as the dual demo setup).

It is that simple. You can execute each steps on your own. Check the configuration file for that channel : you will see only a few JSON documents, human readable, with no heavy configuration repeated. Check your Kibana and Grafana dashboards, in minutes you have a running system.

That is our vision of the PunchPlatform : you program it, we make it easy for you to run it on a production cluster.

Understanding Template files

The PunchPlatform use Jinja2, a simple templating engine. It is pretty much given to you as is : you are in charge of inventing your channels and topologies.

The jinja template used for configuration file generation are located under the $PUNCHPLATFORM_CONF_DIR/templates directory. Have a look there, you will find a lmc/single folder with a few jinja files (suffixed with ‘.j2’). These are the templates used to produce the configuration files.

By quickly switching from one template set to another you can easily test your channels with or without Kafka for instance. It is handy and often used to generate simpler channels to focus on parsing issues.

Working only with topologies

A channel can be quite simple. But a single topology is even simpler and you will likely want to design simple topologies to try out various data pipelines : files to Elastic, open-data to Kafka, beats to Kafka etc..

To do that can be easier by writing directly a few topology files, and run them in foreground. Writing a topology file requires you to understand the spouts and bolts concepts in order to design a processing pipeline. It is basically a publish subscribe pipelining of processing functions.

Check out the $PUNCHPLATFORM_CONF_DIR/examples folder, you will find there some useful examples to start with.

Once you have a topology file defined, you can start it by using the punchplatform-topology.sh command.

# Start a topology in foreground. This is very similar to a logstash use
case.  # Note that doing this requires no Storm cluster. The punchplatform
runs topology # using its own lightweight but production ready storm engine.
# > punchplatform-topology.sh <your topology file>

What to do next