Channels¶

Description¶

A Channel groups several applications into a single unit. Those applications are submitted to an orchestrator (Spark, Storm or Shiva), which takes care of running those applications.

In Punch, channels are mainly used to group multiple Punchlines together. However, channels can be used to group any applications that need to be run together.

For example, you can have a channel with following applications : - a Logstash input application. - a Punchline enriching logs from Logstash. - a Punchline indexing logs on ES. - a Housekeeping application that periodically cleans old logs on Elasticsearch.

The Standalone comes with many demo channels in mytenant tenant.

Have a look at the following folder:

ls -l $PUNCHPLATFORM_CONF_DIR/tenants/mytenant/channels

Interactive mode¶

channelctl is the command to manage channels.

To start channelctl in interactive mode, run the command:

channelctl --tenant mytenant

Now you can list the channels of mytenant channel :

channelctl:mytenant> status

Here is a quick tour of these demo channels:

sourcefire, stormshield_networksecurity and websense_web_security are examples of single punchline channels. They receive, parse, and index logs them into Elasticsearch.
apache_httpd is a 2-punchline channel. The first punchline receives, parses and indexes logs into Elasticsearch. The second punchline archives processed logs on filesystem. The processed logs are transfered from first to second punchline through a Kafka topic.
logstash : a simple channel launching a logstash instance.

To start any of these, use the channelctl command:

channelctl:mytenant> start --channel stormshield_networksecurity

Hit Ctrl-D to exit.

The channelctl command line provides lots of help and auto-completion facilities using the tab key. In the rest of this chapter we provide some explanations for you to get it quickly.

Apache ingestion example¶

Let's start the apache_httpd channel :

channelctl:mytenant> start --channel apache_httpd
start succeeded: channel: apache_httpd cluster: common application: input
start succeeded: channel: apache_httpd cluster: common application: archiving

Check that the channel was correctly submitted :

channelctl:mytenant> status --channel apache_httpd

This channel started two distinct punchlines : input and archiving.

Now, we can inject apache logs to input topology :

punchplatform-log-injector.sh -c $PUNCHPLATFORM_CONF_DIR/resources/injectors/mytenant/apache_httpd_injector.json

Your Apache logs are now parsed and indexed into Elasticsearch with the input topology. In addition, Apache logs are also archived as compressed CSV files under the /tmp/archive-logs/storage directory with the archiving topology.

When you are done, stop injection with Ctrl-C and stop your channel :

channelctl:mytenant> stop --channel apache_httpd

You can also stop all your running channels :

channelctl:mytenant> stop

Aggregation example¶

Now that you are comfortable with stream, let's move on to batch processing. We will run a continuous aggregation channel based on a Spark Plan.

This aggregation is executed each minute and fetch all the logs stored in the mytenant-events-*. Here, each minute, we want to compute :

how many bytes have been written to this index.
what was the size (in bytes) of the biggest log.

Before running the aggregation, we need to provide some data. To do so, let's start two channels with the channelctl and inject some logs.

channelctl start --channel stormshield_networksecurity
channelctl start --channel websense_web_security

Now that channels are running, let's inject some logs from another terminal :

punchplatform-log-injector.sh -c $PUNCHPLATFORM_CONF_DIR/resources/injectors/mytenant

It is important to keep injecting logs in real time because the aggregation will only fetch the last minute's logs. Keep the log injector running and start a new terminal.

From the new terminal, type in this command to start the aggregation :

channelctl start --channel aggregation

Wait about a 1 minute, the time for the first aggregation to be completed. Then, a new Elasticsearch index should show up with the name mytenant-aggregations-YYYY.MM.DD.

Search this index pattern in Kibana and see the results. The documents have the following fields:

{
  "_index": "mytenant-aggregations-2019.05.15",
  "_type": "_doc",
  "_id": "QmHvu2oBm9lH_e9QjytC",
  "_version": 1,
  "_score": null,
  "_source": {
    "total_size_value": 1013339,
    "max_size_value": 298,
    "key": "stormshield",
    "timestamp": "2019-05-15T16:40:00.148+02:00"
  },
  "fields": {
    "timestamp": [
      "2019-05-15T14:40:00.148Z"
    ]
  },
  "sort": [
    1557931200148
  ]
}

As you can see, we get an overview of the total log size and the larger log size over the last minute sorted by technology vendor. Note that one event is generated by vendor each minute. The vendor can be found in the field "key". In this example, the vendor is "stormshield".

Channel Structure¶

You might wonder how your applications are grouped together in a channel, and where they are running. Are they submitted to a Storm, Spark or Shiva cluster ?

This depends on the channel_structure.yaml file. When you start a channel, the Punch launcher simply use that file to schedule/stop/start/status the corresponding applications.

Here are examples :

Submit to Shiva cluster¶

version: '6.0'
start_by_tenant: true
stop_by_tenant: true

resources:
  - type: kafka_topic
    name: mytenant_apache_httpd_archiving
    cluster: common
    partitions: 1
    replication_factor: 1

applications:
  - name: input
    runtime: shiva
    command: punchlinectl
    args:
      - start
      - --punchline
      - input.yaml
    shiva_runner_tags:
      - common
    cluster: common
    reload_action: kill_then_start
  - name: archiving
    runtime: shiva
    command: punchlinectl
    args:
      - start
      - --punchline
      - archiving.yaml
    shiva_runner_tags:
      - common
    cluster: common
    reload_action: kill_then_start

Check app is running in Shiva logs in $PUNCHPLATFORM_SHIVA_INSTALL_DIR/logs/<app-name>

Submit to Storm cluster¶

version: '6.0'
start_by_tenant: true
stop_by_tenant: true
applications:
- name: input
  runtime: storm
  execution_mode: cluster
  cluster: common
  reload_action: kill_then_start

Check app is running in Storm UI.

Submit to Spark cluster¶

stop_by_tenant: true
version: "6.0"
start_by_tenant: true

applications:
- args:
  - start
  - --plan
  - plan.yaml
  - --template
  - punchline.yaml
  - --runtime
  - spark
  - --spark-cluster
  - common
  - --deploy-mode
  - client
  - --last-committed  # persistence
  cluster: common
  shiva_runner_tags:
  - common
  name: plan-aggregation
  runtime: shiva
  command: planctl

Check app is running in Spark UI in http://localhost:7077

Resources¶

There is another thing to notice in the channel_structure.yaml files : they have a resources key. This is used to create Kafka topics.

For instance, in apache_httpd channel, the Kafka topic mytenant_apache_httpd_archiving is created when the channel is started.

Congratulation ! Now, you are ready for high performance stream or batch processing applications !