Skip to content

Channel Configuration

Overview

Abstract

This chapter explains how you assemble various jobs into a channel.

Because it involves several distinct parts, a channel configuration consists in several files, all grouped in a common per tenant folder. As an example, here is the layout of two of the sample channels (delivered as part of the PunchPlatform standalone installation). These channels are representative of a log management solution.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
punchplatform.properties
   /** main platform configuration file **/

resources
    elastalert
          /** rules of elastalert **/
    elasticsearch
          /** elasticsearch mapping templates **/
    injector
          /** log injector to simulate logs **/
    kibana
          /** kibana dashboards **/
    punch
          /** the punchlets and associated resource files **/

 tenants
    mytenant
        channels
            apache_httpd
                channel_structure.json
                single_topology.json
        etc
            /** tenant configuration directory **/

Before explaining the content of each, first the big picture;

  • mytenant is the name of the tenant. These names are user defined, they have been chosen in the demo setup for illustrative purpose.
  • apache_httpd is the name of a channel : i.e. all apache logs for this tenant will be handled by this channel. (This assumes of course that all such apache logs can be sent to that particular channel). This name is user-defined.
  • the resources/punch directory contains the platform punchlets. You can define punchlets for all tenants, as in this example, or define them only for a tenant or a channel. The PunchPlatform comes in with a complete set of example parsers, organized in several sub folders.

In a given channel directory, you find a [channel_structure.json] file and one or several other files. The [channel_structure.json] defines the structure of the channel, it can be composed of

  • stream applications : for example Spark streaming or Storm applications
  • batch application : typically spark PML jobs.
  • various tasks : applications you periodically execute for various functional purpose.

Info

it is a good practice to put into a channel only functional tasks, and groups administrative tasks into a service.

Channel Structure

Let us focus on a channel structure file.

Layout

The channel_structure.json file has the following content:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
{
    // version is usually the one of the main punch release.
    // This is to guarantee backward compatibility. 
    version : "5.0"

    // A channel can include several processing units.
    jobs : [
        { job part }
    ]
    // an potentially some shared resources such as kafka topics
    resources : [
      { resource part }
    ]
}

Jobs can be of three types: storm, spark or shiva. Whatever their type they all have the three following properties:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
{
    // a unique name.
    name: myjob

    // the name of the cluster in charge of running that job.
    // A cluster can be a Storm|Spark|Shiva or other cluster, 
    // as long as the corresponding cluster is defined in the 
    // punchplatform.properties configuration file.
    cluster: main

    // one of `none` or `kill_then_start`. 
    // Using `kill_then_start` makes the job is not affected by a 
    // channel reload order. This property is optional and
    // equal to 'kill_then_start' by default.
    reload: none

    // other properties
    ...
}

Note

the unique name of every job is <tenant_name>/<channel_name>/<cluster>/<job_name>.

Storm Jobs

Storm jobs are in fact topologies. These are ever-running streaming apps.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
{
    // the name MUST refer to a local single_topology.json file
    // where the storm topology is described.
    name: single_topology

    type : storm

    // the name of your target storm cluster
    cluster: main

    reload_action: kill_then_start
}

Spark Job

Spark jobs are defined using a
pml configuration file. You can define a pml job only if that job is an ever running streaming application. Otherwise, i.e. if the spark job is a batch processing unit that eventually ends, submit it to shiva (see below).

Here is an example

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
    {
      type : spark

      // This must correspond to a local my_spark_app.pml file
      name : my_spark_app

      // the name of a declared spark cluster
      cluster : main
    }
}

Shiva Jobs

Shiva jobs can be either ever-running stream apps, or batch apps that eventually terminates. You can request shiva to periodically relaunch your batch applications.

Here is an example to request shiva to start a logstash daemon.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
{
    // the short name of the task. The task unique name
    // will appear as <tenant>_<channel>_<name>.
    name : my_shiva_job

    // the command must be an executable.
    command : logstash

    // your command arguments 
    args : [ "-f" , "logstash.yaml" ]

    // the associated resources, if any. Here you must provide
    // the logstash.yaml file. The accepted resources are files or
    // folders.
    resources : [
        "logstash.yaml"
    ]

    // the target shiva cluster. This key must be associated to a
    // shiva cluster as defined in your punchplatform.properties file.
    cluster : common

    // the tags to place your task to the shiva node you want.
    shiva_runner_tags : [ "standalone" ]

    // an optional cron expression should you require periodic
    // scheduling of your task. Here is an example to execute
    // it every 30 seconds
    // quartzcron_schedule : 0/30 * * * * ? *
}

Shiva is a lightweight distributed runtime engine. It will take care of running your application on one or several nodes in a robust and resilient way. Refer to the Shiva chapter

Using the punch and shiva, you can basically execute two kinds of applications :

  1. the ones provided and fully integrated by the punchplatform. These are described next.
  2. your own. Simply provide an executable command. Shiva will take care of executing it on the target servers. It is however your task to equip the target server(s) with the necessary environment to run your task.

Logstash

Logstash is fully integrated, as long as you selected the corresponding shiva deployment option.

pml

You can run pml batch jobs directly from Shiva. This is the simplest way to execute a spark job. Note that using shiva requires you to select the spark client deploy mode.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
{
    type: shiva
    name: myjob
    command: pml
    args: [
        "--job", "job.pml",
        "--deploy-mode", "foreground"
    ]
    resources: [
        job.pml
    ]
    cluster: common
    shiva_runner_tags: [ standalone ]
}

plan

Plans are used to periodically run pml jobs using advanced templating capabilities. Typically is it used whenever you want to run at specific time interval, and consuming specific ranges of timed data.

The corresponding command is plan. As you must provide both a job template and a plan configuration file, the example below use a folder

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
{
    type: shiva
    name: mytopomyplanlogy
    command: plan
    args: [
        "--plan", "plan.hjson",
        "--template", "job.template",
        "--deploy-mode", "foreground"
    ]
    resources: [
        plan.hjson
        job.template
    ]
    cluster: common
    shiva_runner_tags: [ standalone ]
}

topology

In addition to running storm topologies in a Storm cluster, the punch also supports a lightweight single-process storm engine. It can be used on small configurations to run topologies without the cost of operating a storm cluster.

The corresponding shiva command is topology.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
{
    type: shiva
    name: mytopology
    command: topology
    args: [ "--mode", "light", "--topology", "topology.json" ]
    resources: [
        topology.json
    ]
    cluster: common
    shiva_runner_tags: [ standalone ]
}

Jar

Shiva can too execute an external jar application.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
{
    type: shiva
    name: myjar
    command: java
    args: [ "-jar", "myjar.jar" ]
    resources: [
        myjar.jar
    ]
    cluster: common
    shiva_runner_tags: [ standalone ]
}

Resources

The resources part of the channel_structure.json file lets you define global resources potentially shared by your channel jobs. As of today only kafka topic resources are supported.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
resources : [
    {
        type: kafka_topic

        name : mytenant_arkoon_output

        // the logical Kafka cluster name. A corresponding entry must appear
        // in your punchplatform.properties** file.
        cluster : local

        // the number of partitions for this topic. The more partitions, 
        //the more scalable is your channel.
        partitions : 4

        // the number of replica for each partition. 2 is a minimum to achieve
        // high-availability.
        replication_factor : 1
  }
]

Advanced Options

The channel structure file accept the following additional options.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
{
    // true by default. If set to false, the channel will only start
    // if you use a specific per channel start command. It will not start
    // if you start all the tenant.  
    start_by_tenant : true


    // true by default. If set to false, the channel will not stop if you issue 
    // a tenant level stop command. Stopping it requires a dedicated per channel command. 
    // This is to prevent unwanted stop of important administrative or critical commands. 
    stop_by_tenant : true
}