Channels¶
Description¶
A Channel groups several applications into a single unit. Those applications are submitted to an orchestrator (Spark, Storm or Shiva), which takes care of running those applications.
In Punch, channels are mainly used to group multiple Punchlines together. However, channels can be used to group any applications that need to be run together.
For example, you can have a channel with following applications : - a Logstash input application. - a Punchline enriching logs from Logstash. - a Punchline indexing logs on ES. - a Housekeeping application that periodically cleans old logs on Elasticsearch.
The Standalone comes with many demo channels in mytenant
tenant.
Have a look at the following folder:
ls -l $PUNCHPLATFORM_CONF_DIR/tenants/mytenant/channels
Interactive mode¶
channelctl
is the command to manage channels.
To start channelctl
in interactive mode, run the command:
channelctl --tenant mytenant
Now you can list the channels of mytenant
channel :
channelctl:mytenant> status
Here is a quick tour of these demo channels:
sourcefire
,stormshield_networksecurity
andwebsense_web_security
are examples of single punchline channels. They receive, parse, and index logs them into Elasticsearch.apache_httpd
is a 2-punchline channel. The first punchline receives, parses and indexes logs into Elasticsearch. The second punchline archives processed logs on filesystem. The processed logs are transfered from first to second punchline through a Kafka topic.logstash
: a simple channel launching a logstash instance.
To start any of these, use the channelctl
command:
channelctl:mytenant> start --channel stormshield_networksecurity
Hit Ctrl-D
to exit.
The channelctl
command line provides lots of help and auto-completion facilities using the tab key.
In the rest of this chapter we provide some explanations for you to get it quickly.
Apache ingestion example¶
Let's start the apache_httpd
channel :
channelctl:mytenant> start --channel apache_httpd
start succeeded: channel: apache_httpd cluster: common application: input
start succeeded: channel: apache_httpd cluster: common application: archiving
Check that the channel was correctly submitted :
channelctl:mytenant> status --channel apache_httpd
This channel started two distinct punchlines : input
and archiving
.
Now, we can inject apache logs to input
topology :
punchplatform-log-injector.sh -c $PUNCHPLATFORM_CONF_DIR/resources/injectors/mytenant/apache_httpd_injector.json
Your Apache logs are now parsed and indexed into Elasticsearch with the input
topology.
In addition, Apache logs are also archived as compressed CSV files under the /tmp/archive-logs/storage
directory with the archiving
topology.
When you are done, stop injection with Ctrl-C
and stop your channel :
channelctl:mytenant> stop --channel apache_httpd
You can also stop all your running channels :
channelctl:mytenant> stop
Aggregation example¶
Now that you are comfortable with stream, let's move on to batch processing. We will run a continuous aggregation channel based on a Spark Plan.
This aggregation is executed each minute and fetch all the logs stored in the mytenant-events-*
.
Here, each minute, we want to compute :
- how many bytes have been written to this index.
- what was the size (in bytes) of the biggest log.
Before running the aggregation, we need to provide some data. To do so, let's
start two channels with the channelctl
and inject some logs.
channelctl start --channel stormshield_networksecurity
channelctl start --channel websense_web_security
Now that channels are running, let's inject some logs from another terminal :
punchplatform-log-injector.sh -c $PUNCHPLATFORM_CONF_DIR/resources/injectors/mytenant
It is important to keep injecting logs in real time because the aggregation will only fetch the last minute's logs. Keep the log injector running and start a new terminal.
From the new terminal, type in this command to start the aggregation :
channelctl start --channel aggregation
Wait about a 1 minute, the time for the first aggregation to be completed. Then, a new
Elasticsearch index should show up with the name mytenant-aggregations-YYYY.MM.DD
.
Search this index pattern in Kibana and see the results. The documents have the following fields:
{
"_index": "mytenant-aggregations-2019.05.15",
"_type": "_doc",
"_id": "QmHvu2oBm9lH_e9QjytC",
"_version": 1,
"_score": null,
"_source": {
"total_size_value": 1013339,
"max_size_value": 298,
"key": "stormshield",
"timestamp": "2019-05-15T16:40:00.148+02:00"
},
"fields": {
"timestamp": [
"2019-05-15T14:40:00.148Z"
]
},
"sort": [
1557931200148
]
}
As you can see, we get an overview of the total log size and the larger log size over the last minute sorted by technology vendor. Note that one event is generated by vendor each minute. The vendor can be found in the field "key". In this example, the vendor is "stormshield".
Channel Structure¶
You might wonder how your applications are grouped together in a channel, and where they are running. Are they submitted to a Storm, Spark or Shiva cluster ?
This depends on the channel_structure.yaml
file.
When you start a channel, the Punch launcher simply use that file to
schedule/stop/start/status the corresponding applications.
Here are examples :
Submit to Shiva cluster¶
version: '6.0'
start_by_tenant: true
stop_by_tenant: true
resources:
- type: kafka_topic
name: mytenant_apache_httpd_archiving
cluster: common
partitions: 1
replication_factor: 1
applications:
- name: input
runtime: shiva
command: punchlinectl
args:
- start
- --punchline
- input.yaml
shiva_runner_tags:
- common
cluster: common
reload_action: kill_then_start
- name: archiving
runtime: shiva
command: punchlinectl
args:
- start
- --punchline
- archiving.yaml
shiva_runner_tags:
- common
cluster: common
reload_action: kill_then_start
$PUNCHPLATFORM_SHIVA_INSTALL_DIR/logs/<app-name>
Submit to Storm cluster¶
version: '6.0'
start_by_tenant: true
stop_by_tenant: true
applications:
- name: input
runtime: storm
execution_mode: cluster
cluster: common
reload_action: kill_then_start
Submit to Spark cluster¶
stop_by_tenant: true
version: "6.0"
start_by_tenant: true
applications:
- args:
- start
- --plan
- plan.yaml
- --template
- punchline.yaml
- --runtime
- spark
- --spark-cluster
- common
- --deploy-mode
- client
- --last-committed # persistence
cluster: common
shiva_runner_tags:
- common
name: plan-aggregation
runtime: shiva
command: planctl
Resources¶
There is another thing to notice in the channel_structure.yaml
files : they have a resources
key.
This is used to create Kafka topics.
For instance, in apache_httpd
channel, the Kafka topic mytenant_apache_httpd_archiving
is created when
the channel is started.
Congratulation ! Now, you are ready for high performance stream or batch processing applications !