Skip to content

punchplatform.properties

Abstract

the punchplatform.properties file describes what your platform is made off. It is used both at deploy time, and at runtime by channels and jobs to retrieve key configuration properties.

Overview

The punchplatform.properties file is first used when you deploy a new production PunchPlatform. Declare in there how many Elasticsearch, Kafka, Storm Spark etc.. nodes you need, in turn the punchplatform deployer tool will install everything where needed.

Second, it is used at runtime, by the various PunchPlatorm commands and services. For example whenever you start a channel that (say) read some data from Kafka, this file will be used to get the corresponding Kafka cluster IP addresses. That is why it will also be part of your installed PunchPlatform.

What you find in the punchplatform.properties file are

  • the names, hosts, ports and URLS of all inner services
    • Storm, Zookeeper, Elasticsearch cluster addresses
    • Kibana URLs
  • some key configuration parameters
    • Java processes memory size
    • cluster settings
    • virtual ip addresses

Because of its key role, this file is often referred to from other chapters.

Important

In this file, when nodes names are provided (as servers lists or dictionaries keys in various clusters), the provided host names will be used for reaching the machines to deploy from the deployment environment, and must therefore be names resolved and reachable from this environment.

When no other specific setting exist to indicate the network interface on which the services will be boud, the nodes hostname may also be used by the cluster framework to communicate with each-other ; therefore they should be resolved as production interface from these machines, to avoid production data flow going through administration networks.

Location

if you read this because you perform a fresh installation, your goal will be to create the PunchPlatform file from scratch.

The punchplatform.properties file is located in your platform configuration directory. Remember you have a PUNCHPLATFORM_CONF_DIR environment variable defining that location.

On production platforms, you will notice that the punchplatform.properties may be a symbolic link to the actual file, itself located in a platforms/ subdirectory, where platformName is usually production, it can be different in pre-production or testing environments. On a standalone platform, there is no symbolic link, only a straight file.

When using the PunchPlatform command-line tools, the PunchPlatform configuration root folder must be provided using the PUNCHPLATFORM_CONF_DIR environment variable.

Reference configuration in zookeeper and relation to (optional) git repository

Because the operator environment is distributed (multiple operators, multiple machines), the reference configuration is stored in the administration zookeeper cluster, and must be retrieved using punchplatform-getconf.sh -pf before doing changes to the platform configuration, and saved using 'punchplatform-putconf.sh -pf` after changes, so that other operators can access it later (and for example see the up to date version in the punchplatform kibana plugin).

Note that it is ALSO a very much advised good practive to maintain a git configuration management of your punchplatform configuration tree. You should alway commit and push the updated reference configuration to your central git repository. This does NOT make git the 'official' reference for the applicable configuration of your platform : the reference version is the one that has been stored using 'punchplatform-putconf.sh', so always work using this reference version, even if you maintain a git repository for traceability/rollbacks purpose.

Content

File format

This file a JSON file, in which you are (this is not standard json, though) free to add # prefixed comments.
Do not forget to add initial brackets ({}) and pay attention to the comma (,) at the end of each properties block and end of file. You can test your json syntax by using sed 's/#.*//g' punchplatform.properties | jq .

Avoid any Encoding issue

To avoid any encoding issue, you should use only upper/lower case non-accentuated alphanumeric characters for all your id, hostname, cluster name, tags and so on.

Remember that in JSON surrounding a number with quotes changes its type from Number to String.

Documentation conventions

Each component provides an example of configuration that needs to be adapted to your environment.
Follows from the list of parameters.

  • name.of.parameter: Type DefaultValueIfExist
    • Description of the parameter.

If the Type is in bold, the parameter is mandatory when this settings section is present.

In the following we detail every configuration parameter.

Platform

Each platform is associated with a few configuration properties. These are grouped in a dictionary section. In particular each platform is thus uniquely identified. This identifier appears in turn in metrics tags, typically forwarded from one platform to another.

1
2
3
4
5
6
7
8
9
"platform": {
    "platform_id" : "punchplatform-primary",
    "reporters" : [
        {
            "type": "elasticsearch",
            "cluster_name": "es_search"
        }
    ]
}
  • platform_id (string)

    The unique platform identifier. This identifier must be worldwide unique.

  • reporters (map[])

    A list of reporters configuration. This section lets you define where the platform components will send their logs and metrics. Every start/stop command will be traced, along with useful information. Shiva will also log its own actions, and redirect its child jobs logs.

    For the elasticsearch reporter, logs will be send to the "platform-logs-[yyyy.MM.dd]" index and the metrics to "platform-metrics-[yyyy.MM.dd]".

    To see all the available reporters, please refer to the dedicated reporter section.

Zookeeper

ZooKeeper is a distributed coordination service used by many applications. It exposes its service to client applications as a distributed filesystem.
Zookeeper is used by several of the PunchPlatform components : Storm, Kafka, Kafka consumers (kafka_spout). It is used as well by the PunchPlatform services to store various important configuration or runtime data.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
"zookeeper" : {
    "clusters" : {
        "common" : {
            "hosts" : ["node01","node02","node03"],
            "cluster_port" : 2181,
            "punchplatform_root_node" : "/punchplatform-primary"
        }
    },
    "install_dir": "/data/opt/apache-zookeeper-3.5.5-bin"
},
  • clusters.<clusterId> : String

    • the clusterId is a string composed of alphanumeric characters and [-]. It is used by PunchPlatform command-line tools and various and configuration files to refer to the corresponding cluster.
    • There can be one or several zookeeper.clusters.[clusterId] sections, depending on your platform(s) setup. Multiple clusters are typically used to define several zones with different security levels and data flows restrictions.
    • The clusterIds must be unique in the scope of a punchlatform. Note that if you define only one zookeeper cluster in your platform, most PunchPlatform commands will automatically use it as the default cluster, without the need to provide explicit identifiers.
  • clusters.<clusterId>.hosts : String

    • Comma-separated array of the zookeeper server hostnames part of this zookeeper cluster. At least 3 must be provided for resilience;
      only an ODD number can be provided. This is a zookeeper requirement to avoid split-brain scenario
    • These hostnames are used by PunchPlatform commands that need to find
      an available node in the Zookeeper cluster. This parameter should match the
      actual list of servers configured in the running zookeeper cluster
      (see zoo.cfg file in your zookeeper cluster configuration ; this file is generated automatically when using PunchPlatform deployment tools).
    • Watchout: these name must EXACTLY match the result of executing the hostname command on the corresponding servers.
    • This name must refer to one of the defined Elasticsearch cluster.
  • clusters.<clusterId>.cluster_port : Number

    • TCP port use by the Zookeeper cluster. I.e. all Zookeeper nodes will bind th that port for communicating with client applications as well as to communicate together.
  • clusters.<clusterId>.punchplatform_root_node : String

    • this string defines the zookeeper root path, starting with /. All PunchPlatform zookeeper data will be stored under that path. This parameter is Mandatory in particular whenever you share a common cluster between several platforms.
  • clusters.<clusterId>.supervisor: Undefined

    • Zookeeper nodes are supervised by supervisord. Its logrotate parameters can be configured in this section.
  • install_dir: String

    • path to the zookeeper installation directory, where the zookeeper bin and log directories are expected to be located.
    • These are used by the PunchPlatform command-line tools to create/check zookeeper nodes, check zookeeper cluster status, start a standalone zookeeper service should you run standalone platform, etc..
    • Note : on a standalone platform the zookeeper installation directory is set to $INSTALL_DIR/external.

Elasticsearch

Elasticsearch is a document based database. It indexes JSON documents to provide advanced search capabilities. In particular it provides the Kibana frontend applications with their data (business data or metrics data) backends.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
"elasticsearch" : {
    "clusters" : {
        "es_search" : {
            "nodes" : {
                "node01" : {
                    "http_api_address" : "node01",
                    "transport_address" : "node01",
                    "bind_address" : "0.0.0.0",
                    "rack_id" : "1"
                },
                "node02" : {
                    "http_api_address" : "node02",
                    "transport_address" : "node02",
                    "bind_address" : "_eth1_",
                    "rack_id" : "2"
                },
                "node03" : {
                    "http_api_address" : "node03",
                    "transport_address" : "node03",
                    "bind_address" : "_eth1_",
                    "rack_id" : "3"
                }
            },
            "http_api_port" : 9200,
            "cluster_production_transport_address" : "node0a",
            "transport_port" : 9300,
            "minimum_master_nodes": 1,
            "settings_by_type" : {
                "data_node": {
                    "max_memory": "2048m",
                    "modsecurity_enabled": false,
                    "modsecurity_blocking_requests": false,
                    "script_execution_authorized": true,
                    "http_cors_enabled" : true,
                    "readonly" : true
                }
            }
            "plugins":{
              "opendistro_security": {
                "ssl_transport_certs_dir": "/data/pp-conf/es_security/certs",
                "ssl_transport_pemkey_name": "admin-key-pkcs8.pem",
                "ssl_transport_pemcert_name": "admin-cert.pem",
                "ssl_transport_pemtrustedca_name": "rootca-cert.pem",
                "nodes_dn": ["emailAddress=admin@thalesgroup.com,CN=admin,OU=SAS,O=TS,L=VLZ,ST=Paris,C=FR"]
              }
            }
        }
    }
},
  • clusters.<clusterId>: String

    • clusterId is a string composed of alphanumeric characters and undescore (_) used by PunchPlatform commands and configuration files to uniquely identify and and refer to a given Elasticsearch cluster. When the Elasticsearch is configured, that clusterId is also used for generating metrics names.

    • A PunchPlatform can contain one or several Elasticsearch clusters, hence one or several JSON elasticsearch.clusters.<clusterId> sections.

    • Each clusterId must be unique within the scope of a single PunchPlatform. If you define only one elasticsearch cluster in your
      platform, most PunchPlatform commands will automatically use it as the default cluster, without the need to provide explicit identifiers.

  • clusters.<clusterId>.nodes.<nodeHostname>: String

    • Dictionary indexed by the hostnames of the nodes composing the Elasticsearch cluster.
      Theses hostnames will be used by the PunchPlatform cluster deployment tool. The dictionary values are described below.
  • clusters.<clusterId>.nodes.<nodeHostname>.http_api_address: String

    • FQDN (domain) or IP of the REST API provided by the Elasticsearch node on the production network.
  • clusters.<clusterId>.nodes.<nodeHostname>.transport_address: String

    • FQDN or IP of the internal Elasticsearch communicationport exposed on the production network (reachable by the other nodes of the Elasticsearch cluster).
      This parameter is used for PunchPlatform channel deployment, when using transport protocol for documents indexation, in order to directly send data from storm topologies to the cluser data nodes.
  • clusters.<clusterId>.nodes.<nodeHostname>.bind_address: String "default ES interface"

    • When provided, this parameter defines the network
      address(es) to which the elasticsearch node will bind (and therefore
      listen for incoming request). This can be provided in all forms
      supported by the elasticsearch host parameter.

    • If not provided, the bind address will be determined using default
      elasticsearch production interface provided in the deployment.settings.

    • Typical value on cluster setup : not provided for main elasticsearch
      cluster nodes, and on metrics-backend elasticsearch
      cluster nodes (GRAxxx) if such separate cluster exists.

  • clusters.<clusterId>.nodes.<nodeHostname>.type: String "data_node"

    • The type of the node: data_node, master_node, client_node or only_data_node (ie data node without master role enabled).
      By default the elasticsearch node is data_node.

    • This parameter is used for PunchPlatform cluster deployment, in order to automatically configure the Elasticsearch cluster nodes to set the data and master directive in the elasticsearch configuration file.

  • clusters.<clusterId>.nodes.<nodeHostname>.tag: String

    • Nodes can be tagged. It allows indices placement, i.e an index can be placed on a particular set of nodes according to their
      tags. It can be used into Hot/Warm architectures for example.
      Index/tags mapping has to be declared on Elasticsearch mappings (or
      mapping templates in PunchPlatform) in parameter settings.index.routing.allocation.require.box_type.
    • Note: It uses node.box_type Elasticsearch property.
  • clusters.<clusterId>.http_api_port: Integer

    • Listening port number of the REST API provided by the elasticsearch node on the production network
  • clusters.<clusterId>.cluster_production_transport_address: String

    • Listening network address or interface name of the cluster when using transport protocol (to be used with transport_port parameter).
      This address will be used for example for metrics-containing clusters, to allow access by the channel health monitoring service of the PunchPlatform admin server.
  • clusters.<clusterId>.api_hosts_for_monitoring: String[]

    • this optional array of string of the form "host:port" can be provided when the platform monitoring daemon ( shiva cluster) is not able to directly reach the Elasticsearch API using the hosts and ports from "nodes" settings . This is the case if this shiva cluster is running in a separate admin area, with no routing to the Elasticsearch cluster production network interface.

    • e.g.
      json "api_hosts_for_monitoring" : [ "myelasticsearchvip.admin.network:9200"]

  • clusters.<clusterId>.transport_port: Integer

    • Listening port number of the internal Elasticsearch communication port exposed on the production network (reachable by the other nodes of the Elasticsearch cluster).
  • clusters.<clusterId>.minimum_master_nodes: Integer

    • Define settings to prevent the by configuring the majority of nodes (total = number of nodes / 2 + 1)
  • clusters.<clusterId>.recover_after_nodes: Integer 0

    • Recover as long as this many data or master nodes have joined the cluster. Default value: 0
  • clusters.<clusterId>.expected_nodes: Integer 0

    • The number of (data or master) nodes that are expected to be in the cluster. Recovery of local shards will start as soon as the expected number of nodes have joined the cluster. Defaults to 0
  • clusters.<clusterId>.recover_after_time: String "5m"

    • If the expected number of nodes is not achieved, the recovery process waits for the configured amount of time before trying to recover regardless. Defaults to 5m if one of the expected_nodes settings is configured.
  • clusters.<clusterId>.settings_by_type: Object

    • define settings for data_nodes
  • clusters.<clusterId>.settings_by_type.client_type: String "data_node"

    • Refer to Elasticsearch type. By default it is data_node. It can be client_node
  • clusters.<clusterId>.settings_by_type.client_type.max_memory String

    • Maximum size of each Elasticsearch nodes Jvm memory. Rule of thumb is half the size of VM RAM, assuming one elasticsearch server per VM. Should be kept below 32G to avoid the JVM to have to use large size pointers and memory tables.
  • clusters.<clusterId>.settings_by_type.client_type.modsecurity_enabled: Boolean

    • Enable (true) or disable (false) the installation and the configuration of modsecurity
  • clusters.<clusterId>.settings_by_type.client_type.modsecurity_blocking_requests: Boolean true

    • During the integration of a PunchPlatform, this setting sets the modsecurity in NO BLOCKED mode.
  • clusters.<clusterId>.settings_by_type.client_type.script_execution_authorized: Boolean

    • Enable (true) or disable (false) the execution of script through elasticsearch. This settings must be set to true to display all grafana dashboards properly. We recommend to set this setting to false for customer client elasticsearch cluster for security purpose.
  • clusters.<clusterId>.settings_by_type.client_type.http_cors_enabled: Boolean

    • Enable (true) or disable (false) cross-origin resource sharing, i.e. whether a browser on another origin can do requests to Elasticsearch
  • clusters.<clusterId>.settings_by_type.client_type.readonly: Boolean

    • Enable (true) or disable (false) readonly modsecurity. It will deny search, visualization and dashboard creation for the user.
  • clusters.<clusterId>.override_elasticsearch_version: String

    • In some cases, especially after a migration of elasticsearch with snapshot mecanism, you want to switch the
      elasticsearch version for only one cluster. Usually the query one.
  • clusters.<clusterId>.supervisor: Undefined

    • Elasticsearch nodes are supervised by supervisord. Its logrotate parameters can be configured in this section.
  • clusters.<clusterId>.plugins: Object

    • Elasticsearch plugins.
  • clusters.<clusterId>.plugins.opendistro_security: Object

    • Elasticsearch Opendistro Security plugin configuration.
  • clusters.<clusterId>.plugins.opendistro_security: Object

    • Elasticsearch Opendistro Security plugin configuration.
  • clusters.<clusterId>.plugins.opendistro_security.ssl_transport_certs_dir: String

    • SSL certificates directory
  • clusters.<clusterId>.plugins.opendistro_security.ssl_transport_pemkey_name: String

    • Certificate key file name with extension. Example: admin-key-pkcs8.pem
  • clusters.<clusterId>.plugins.opendistro_security.ssl_transport_pemcert_name: String

    • Certificate name file with extension. Example: admin-cert.pem
  • clusters.<clusterId>.plugins.opendistro_security.ssl_transport_pemtrustedca_name: String

    • Trusted Certificate Authority file name with extension: rootca-cert.pem
  • clusters.<clusterId>.plugins.opendistro_security.nodes_dn: String[]

    • Distinguished Name (DN). Example ["emailAddress=admin@thalesgroup.com,CN=admin,OU=SAS,O=TS,L=VLZ,ST=Paris,C=FR"]
  • install_dir: String

    • STANDALONE Mandatory: Path to the elasticsearch setup that is used
      only by the PunchPlatform cluster deployment tool, and by
      PunchPlatform standalone distribution command-line tools punchplatform-standlone.sh and punchplatform-elasticsearch.sh when running a local
      elasticsearch node.

    • Initial value on standalone PunchPlatform setup : elasticsearch
      distribution directory deployed by PunchPlatform distribution in its $INSTALL_DIR/external directory

Kibana

It is a front end application that allow an user to search and display data from Elasticsearch.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
"kibana" : {
    "domains" : {
        "admin" : {
            "es_cluster_target": "es_search",
            "es_type_of_nodes_targeted": "data_node",
            "kibana_port" : 5601,
            "type" : "administration"
        }
    },
    "servers" : {
        "node01" : {
            "address" : "0.0.0.0"
        },
        "node02" : {
            "address" : "node02"
        },
        "node03" : {
            "address" : "node03"
        }
    },
     "plugins":{
       "opendistro_security":{
         "ssl_verification_mode": "none",
         "elasticsearch_username": "kibanaserver",
         "elasticsearch_password": "kibanaserver"
       },
       "punchplatform": {
         "zookeeper_cluster": "punchplatform",
         "spark_cluster": "spark_main",
         "extraction_path": "/home/vagrant/extractions",
         "tmp_path": "/home/vagrant/tmp",
         "job_editor_tenant": "mytenant",
         "job_editor_index": "jobs",
         "platform_editor_tenants": ["mytenant"]
       }
     }
},
  • domains.<domainName>: String

    • Comma-separated array which defined each instance of kibana
  • domains.<domainName>.es_cluster_target: String

    • Cluster name that the kibana is allowed to contact
  • domains.<domainName>.es_type_of_nodes_targeted: String

    • Elasticsearh node type that the kibana is allowed to contact
  • domains.<domainName>.kibana_port: Boolean

    • TCP port used to acces to kibana on HTTP
  • domains.<domainName>.type: String "external_customer"

    • The default value is external_customer. Value : which connects the kibana through the front reverse proxy and the other value which connects the kibana through the admin proxy (and can also be directly accessed from) network
  • domains.<domainName>.servers: String

    • List of hostnames, no ip. It allow to start kibana instances only on the described hosts. To deploy properly this feature, you have to: 1) Filled the kibana.servers section with ALL kibana servers 2) Select few servers in the previous section to start selected domains on these servers. Must be filled on kibana.domains..servers
  • servers.<serverName>: String

    • Comma-separated array of several parameters of each server
  • servers.<serverName>.address: String

    • Address used to bind the kibana process
  • chrooted: Boolean false

    • Set to true to enabled this function. Only taken into account if there is only one- domain specified. If set to false, the running instances of Kibana won't be jailed in a chroot. Recommended in production.
  • install_dir: String

    • STANDALONE Mandatory. Path to the kibana setup that is used by punchplatform-standalone.sh and punchplatform-kibana.sh for starting a local kibana server when running PunchPlatform in a sample configuration, and is also used to configure kibana automatically when deploying a PunchPlatform

    • Initial value on standalone PunchPlatform setup : kibana distribution directory deployed by PunchPlatform distribution in its $INSTALL_DIR/external directory

  • apm_install_dir: String

  • STANDALONE Mandatory. Path to the apm-server used by the Punchplatform plugin to send logs into elasticsearch

Kibana Opendistro plugin

  • kibana.plugins.opendistro_security.ssl_verification_mode: String

    • SSL verification mode. none
  • kibana.plugins.opendistro_security.elasticsearch_username: String

    • Elasticsearch default username.
  • kibana.plugins.opendistro_security.elasticsearch_password: String

    • Elasticsearch default password.

Kibana Punch plugin

  • kibana.plugins.punchplatform.tmp_path

    • Default "/tmp/punch". Set temporary working directory for Punch plugin. Kibana user must have write rights
  • kibana.plugins.punchplatform.documentation_enabled

    • Default true. Enables documentation component
  • kibana.plugins.punchplatform.documentation_path

    • Default "../../doc/html". Set the Punch documentation root directory.
  • kibana.plugins.punchplatform.documentation_version

    • MANDATORY. Set the Punch documentation version.
  • kibana.plugins.punchplatform.zookeeper_cluster

    • MANDATORY. Set Zookeeper cluster name to use
  • kibana.plugins.punchplatform.zookeeper_hosts

    • Alternatively to zookeeper_cluster, Format ["{zk_cluster}:{zk_port}"]. Set Zookeeper hosts to pull Punchplatform configuration
  • kibana.plugins.punchplatform.zookeeper_root_node

    • Default resolved from zookeeper_cluster. Required in case of zookeeper_hosts. Set Zookeeper root node to pull Punchplatform configuration
  • kibana.plugins.punchplatform.security_enabled

    • Default false. Enables role based security.
  • kibana.plugins.punchplatform.security_roles

    • Default ["pp_plugin_view", "pp_data_extraction", "pp_configuration_view", "pp_configuration_edit", "pp_platform_channel_control", "pp_pml_view", "pp_pml_execute", "pp_testers_view"]. Set default roles for unauthenticated user - Specify which roles are allowed. Read Kibana Punch plugin roles.
  • kibana.plugins.punchplatform.tools_enabled

    • Default true. Enables Punch tools component
  • kibana.plugins.punchplatform.extraction_enabled

    • Default true. Enables Data extraction component
  • kibana.plugins.punchplatform.extraction_path - Default "/tmp/extractions". Set output directory to store extractions in case of CSV/JSON output Kibana user must have write rights

  • kibana.plugins.punchplatform.extraction_tenant

    • Default "kibana". Set tenant name for all extraction performed from kibana
  • kibana.plugins.punchplatform.extraction_index

    • Default "kibana-jobs". Set extraction elasticsearch index name. All extraction will be saved in this index.
  • kibana.plugins.punchplatform.job_editor_enabled

    • Default true. Enables Job editor component
  • kibana.plugins.punchplatform.job_editor_tenant

    • Default "kibana". Set tenant name for all Job editor from kibana Job editor component
  • kibana.plugins.punchplatform.job_editor_foreground_enabled

    • Default true. Allows "Execute Foreground" from Job editor
  • kibana.plugins.punchplatform.job_editor_background_enabled

    • Default true. Allows "Execute Background" from Job editor
  • kibana.plugins.punchplatform.job_editor_index

    • Default "kibana-jobs". Set Job editor elasticsearch index name. All jobs will be saved in this index.
  • kibana.plugins.punchplatform.spark_cluster

    • MANDATORY. Set Spark cluster name to use
  • kibana.plugins.punchplatform.analytics_background_options

    • Default ["--job", "{{job}}", "--spark-master", "spark://{spark_cluster_host}:{spark_cluster_port}", "--deploy-mode", "cluster"] where spark_cluster_host and spark_cluster_port are resolved from spark_cluster configuration. Set Analytics background options. {{job}} will dynamically replaced by the effective job.
  • kibana.plugins.punchplatform.analytics_foreground_options

    • Default ["--job", "{{job}}"]. Set Analytics foreground options. {{job}} will dynamically replaced by the effective job.
  • kibana.plugins.punchplatform.analytics.scanner_options

    • Default []. Set Analytics scanner options.
  • kibana.plugins.punchplatform.platform_editor_enabled

    • Default true. Enables Platform studio component
  • kibana.plugins.punchplatform.platform_editor_tenants

    • Default []. Restricts to specifics tenants. Leaves empty for all tenants
  • kibana.plugins.punchplatform.platform_editor_channels_enabled

    • Default true. Enables channel management (start/stop/reload)
  • kibana.plugins.punchplatform.ioc_enabled

    • Default true. Enables IOC component
  • kibana.plugins.punchplatform.ioc_index

    • Default "kibana-ioc". Set IOC files elasticsearch index name. All files will be saved in this index.
  • kibana.plugins.punchplatform.ioc_max_size_mb

    • Default 1024. Set maximum size for one IOC file in MBytes

Storm

It is a scalable compute cluster. It runs Java codes such as punchlet and all the PunchPlatform components (spout&bolts).

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
"storm" : {
    "clusters" : {
        "main": {
            "master" : {
                "servers" : ["node01", "node02", "node03"],
                "cluster_production_address" : "node0a"
                },
            "ui" : {
                "servers" : ["node01", "node02", "node03"],
                "cluster_admin_url": "node0a:8080"
            },
            "slaves" : ["node01", "node02", "node03"],
            "zk_cluster" : "common",
            "zk_root" : "storm-1.2.2-main",
            "storm_workers_by_punchplatform_supervisor" : 15,
            "workers_childopts" : "-Xmx1G",
            "supervisor_memory_mb" : 8192,
            "supervisor_cpu" : 4
        }
    },
    "install_dir": "/data/opt/apache-storm-1.2.2"
},
  • clusters.<clusterId>: String

    • The Storm cluster indentifier. A string composed of letters and numbers. A single Punchplatform can contain several storm clusters.
  • clusters.<clusterId>.master.servers: String[]

    • A comma-separated array of hostnames of the servers that will run Storm so-called nimbus processes in charge of scheduling the starting/stopping of topologies.
  • clusters.<clusterId>.master.thrift_port: Integer 6627

    • The thrift TCP Port used for storm inter communication. Default value: 6627
  • clusters.<clusterId>.ui.servers: String[]

    • A comma-separated array of hostnames of the servers that will run the Storm so-called ui server, providing inbuilt monitoring Wwb interface and an associated REST API.
  • clusters.<clusterId>.ui.ui_port: Integer 8080

    • Rhe listening TCP Port of the Storm ui servers. Default value: 8080
  • clusters.<clusterId>.slaves: String[]

    • A comma-separated array of hostnames of the servers that will run Storm so-called supervisor processes in charge of starting/stopping topologies, as requested by the nimbus.
  • clusters.<clusterId>.zk_cluster: String

    • Identifier of the zookeeper cluster in which the Storm cluster will store its internal cluster management and synchronization data. This must be one of the keys in zookeeper.clusters dictionary documented previously.
  • clusters.<clusterId>.zk_root: String

    • This string is a prefix (composed of letters, digits or '-') that is used as root of all data path in the zookeeper cluster, for data associated to the storm cluster. This allows sharing a same zookeeper cluster for multiple Storm cluster ; therefore it should be unique within a zookeeper cluster (both unique within the PunchPlatform system, but also unique as compared to other zookeeper roots configured in other PunchPlatform for the same zookeeper cluster). We recommend to add the storm version to avoid issue during migration
  • clusters.<clusterId>.storm_workers_by_punchplatform_supervisor: Integer

    • This number indicates the number of Storm slave slots
      that are allowed on each storm slave node (i.e. running Storm supervisor component).

    • This field, multiplied by the JVM memory options of each slave (see workers_childopts field hereafter) should not exceed Storm slave server memory.

  • clusters.<clusterId>.workers_childopts: String

    • This string provides the storm worker jvm options. If an
      empty string is provided, then default Storm settings will be
      applied to these VMs.
    • Storm worker are in charge of running topologies.
    • This field, multiplied by the number of Storm Slots (see storm_workers_by_punchplatform_supervisor field above)
      should not exceed Storm slave server memory.
  • clusters.<clusterId>.supervisor_memory_mb: Integer This number provides the size of RAM of the virtual or physical node. It is used to configure storm.yaml for storm supervisor.

  • clusters.<clusterId>.supervisor_cpu: Integer

    • This number provides the number of CPU of the virtual or physical node. It is used to configure storm.yaml for storm supervisor.
  • clusters.<clusterId>.master.monitoring_interval: Integer 60

    • The period (in seconds) of cyclical acquisition of metrics/supervisor status by Nimbus. Legacy deploIyments use 10, but increasing this value reduces the load of the nimbus service and improve avalability of the Storm UI/API.
  • clusters.<clusterId>.master.monitoring_interval: Integer 60

    • The period (in seconds) of cyclical acquisition of metrics/supervisor status by Nimbus. Legacy deploIyments use 10, but increasing this value reduces the load of the nimbus service and improve avalability of the Storm UI/API.
  • clusters.<clusterId>.master.supervisor_timeout: Integer 90

    • The timeout (in seconds) for declaring a supervisor non-nominal when nimbus monitors it. Legacy settings is 10s. Increasing this value in relation with 'monitoring_interval' helps avoiding false-positives of failed supervisors in loaded situation. As a tradeoff, the reassignment of topologies to a surviving supervisor node in case of loss of the supervisor node previously assigned these topologies will take longer.
  • clusters.<clusterId>.supervisor: Undefined

    • Storm components are supervised by supervisord. Its logrotate parameters can be configured in this section.
  • install_dir: String

    • Path to the Storm directory containing Storm bin/ directory (on Storm nodes, on Punchplatorm admin servers, and on LTR receivers), providing Storm binaries.

    • This will be used by punchplatform-standalone.sh and punchplatform-storm.sh for starting a local Storm
      cluster when running PunchPlatform in a sample
      configuration, and is also used to configure storm components
      automatically when deploying using PunchPlatform cluster deployment tool.

    • Initial value on standalone PunchPlatform setup : Storm distribution
      directory deployed by PunchPlatform distribution in its $INSTALL_DIR/external directory

  • clusters.<clusterId>.published_storm_hostname_source: String

    • The following setting determine the name that the storm will publish in zookeeper, so that other nodes can contact this one So this MUST be a name resolved to the production interface of this node, when resolved on other cluster nodes. It can take different values :
    • "inventory" , storm will publish the hostname set in configurations files as production interface.
    • "server_local_fqdn" , storm will publish the local server fqdn hostname as production interface.
    • "server_local_hostname", storm will publish the local server hostname as production interface.
    • "auto", settings by default, storm will choose which hostname it will publish in zookeeper

Kafka

It is a resilient and scalable queueing application. It stores documents for severals days. Usually, it is used before storm to keep safe data.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
"kafka" : {
    "clusters" : {
        "local" : {
            "brokers" : ["node01:9092", "node02:9092", "node03:9092"],
            "zk_cluster" : "common",
            "zk_root" : "kafka-local",
            "brokers_config" : "punchplatform-local-server.properties",
            "default_replication_factor" : 1,
            "default_partitions" : 2,
            "partition_retention_bytes" : 1073741824,
            "partition_retention_hours" : 24,
            "kafka_brokers_jvm_xmx": "512M"
        }
    },
    "install_dir" : "/data/opt/kafka_2.11-1.1.0"
}

There is another configuration for kafka (BETA):

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
"kafka" : {
    "clusters" : {
        "local" : {
          "brokers_with_ids" : [
              {"id" : 1, "broker" : "node01:9092" },
              {"id" : 2, "broker" : "node02:9092" },
              {"id" : 3, "broker" : "node03:9092" }
          ],
          ...
        }
    }
}
  • clusters.<clusterId>: String

    • clusterId is a string composed of alphanumeric characters and [_] which will be used each time this particular kafka cluster must be identified in a PunchPlatform command-line or configuration file, and also for metrics name generation when elasticsearch reporting is activated by PunchPlatform configuration.

    • There can be one or multiple kafka.clusters.<clusterId> sections, depending on the overall deployment configuration (for example : in order to use different storage configuration for brokers that manage different kind of logs, or to ensure isolation of performance between different log channels). Kafka clusterIds must be unique in a PunchPlatform cluster.

    • Please note that if only one kafka cluster is identified in punchplatform properties file, most PunchPlatform commands will automatically use this kafka cluster without need for clusterId providing on command line.

  • clusters.<clusterId>.brokers[]: String[]

    • Comma-separated array of node providing all kafka brokers in this cluster.

    • This parameter is used for PunchPlatform commands that need to find an available node in the kafka cluster or to start a local kafka broker when running a standalone PunchPlatform configuration.

  • clusters.<clusterId>.zk_cluster: String

    • String identifying the PunchPlatform zookeeper cluster that this kafka cluster will use to persist/exchange its internal configuration, topics, partitions and offsets. This must be one of the keys in zookeeper.clusters dictionary documented previously.

    • This parameter will be used by all PunchPlatform kafka clients (producers and consumers) that will need to locate available kafka brokers for this cluster, because available clusters register themselves in zookeeper.

  • clusters.<clusterId>.zk_root: String

    • This string is a prefix (composed of letters, digits or '-') that is used as root of all data path in the zookeeper cluster, for data associated to the kafka brokers cluster. This allows sharing a same zookeeper cluster for multiple Kafka brokers clusters ; therefore it should be unique within a zookeeper cluster (both unique within the PunchPlatform system, but also unique as compared to other zookeeper roots configured in other PunchPlatform for the same zookeeper cluster).
  • clusters.<clusterId>.brokers_config: String

    • Path to the local kafka broker server configuration. This parameter is used by punchplatform-standalone.sh and punchplatform-kafka.sh when running a local kafka broker server in a PunchPlatform sample configuration. When using punchplatform cluster deployment tool, this field is used to generate the Kafka brokers cluster configuration on Kafka servers.
  • clusters.<clusterId>.default_replication_factor: Integer

    • Default replication level for Kafka topic partitions. This is used henever no replication factor is defined in
      the channel structure configuration (cf. (Channels).
      A number of 1 means no replication, therefore no resilience in case
      of failure of a cluster broker.
  • clusters.<clusterId>.default_partitions: Integer

    • Number of default Kafka topics number of partitions for
      each topic partition, whenever no partitions number is defined in
      channel structure configuration (cf. Channels).

    • Number of partitions allow to scale processing by sharding
      responsibility of consuming Kafka messages between multiple consumer
      instances (if configured in Storm topology

  • clusters.<clusterId>.partition_retention_bytes: Long

    • Maximum size-based retention policy for logs. kafka applies the first condition to delete (either time or size) so this parameter is a failsafe to avoid that any single channel fills up the platform storage in case of flooding of a topic.

    • In a typical cluster setup, we limit each channel PARTITION to: -1000 EventsPerSeconds x 1000bytes x 2 day for , so a typical value of 172800000000 (bytes). - 4000 logs per second for 2 days of flooding, and 1 day of additional nominal storage (2500 lps) with 3000 bytes per enriched log, therefore 1099511627776 for topology for a tenant

  • clusters.<clusterId>.partition_retention_hours: Integer

    • maximum time-based retention policy (if size-based retention policy is not triggered by data amount received)
  • clusters.<clusterId>.kafka_brokers_jvm_xmx: Integer

    • The max size allowed to each kafka broker JVM (this will be used by the kafka startup script).
  • clusters.<clusterId>.supervisor: Undefined

    • Kafka nodes are supervised by supervisord. Its logrotate parameters can be configured in this section.
  • install_dir: String

    • path to the kafka directory containing kafka bin/ directory, providing kafka client shells and binaries, and kafka log/ directory.

    • This will be used by punchplatform-standalone.sh and punchplatform-kafka.sh for starting a local Kafka
      cluster when running PunchPlatform in a sample
      configuration, and is also used to configure kafka components
      automatically when deploying using PunchPlatform cluster deployment tool.

    • Initial value on standalone PunchPlatform setup : kafka distribution directory deployed by PunchPlatform distribution in its
      $INSTALL_DIR/external directory

Shiva

Shiva is the platform distributed task scheduler. It is used to schedule user (channel) tasks, as welll as administration and platform services.

A shiva cluster consists in shiva participants communicating through a zookeeper cluster. Each participant can act as the leader and/or as worker to submit tasks for execution.

Here is the related section in the punchplatform.properties file.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
"shiva": {
    "clusters": {
        "common": {
            "zk_cluster": "common",
            "reporters" : [
                  {
                    "type": "elasticsearch",
                    "cluster_name": "es_search"
                  }
            ],
            "servers": {
                "localhost": {
                    "runner": true,
                    "can_be_master": true,
                    "tags": []
                }
            },
            "plugins": [ "logstash", "spark", "storm" ]
        }
    },
    "install_dir": "/data/opt/punchplatform-shiva-5.5.1"
}
  • clusters.<clusterId>.zk_cluster

    Mandatory

    Identifier of the zookeeper cluster in which the Shiva cluster will store its internal cluster management and synchronization data. This must be one of the keys in [zookeeper.clusters] dictionnary documented previously.

  • clusters.<clusterId>.servers.<serverName>

    Mandatory

    For each shiva node to be deployed, section containing the configuration of the node. The server name is used for resolving the administration interface of the shiva node from the deployment machine.

  • clusters.<clusterId>.servers.<serverName>.runner

    Mandatory

    Boolean indicating if this shiva node will have the 'runner' role. Runners are in charge of executing locally tasks assigned to them by the leader (active master).

  • clusters.<clusterId>.servers.<serverName>.can_be_master

    Mandatory

    Boolean indicating if this shiva node can become the leader of the cluster. The leader is in charge of assigning tasks to an appropriate node (given current balancing of tasks among available runners that match the the task tags requirements.

    If no leader is available, runners will keep executing their assigned services, but no resilience is possible in case of a runner shutdown, and no new task or periodic job execution will occur).

  • clusters.<clusterId>.servers.<serverName>.tags

    OPTIONAL

    List of comma-separated tags strings. This is useful only for worker nodes. Tags are user-defined information strings associated to each node.

    When submitting a task to the Shiva cluster, the user can specify a tags. This allows for tasks placement depending on user needs such as network areas, pre-installed modules required for running the task, etc..

    Default value: ansible_hostname

  • clusters.<clusterId>.shiva_cluster_jvm_xmx

    OPTIONAL

    The max size allowed to each Shiva node JVM (used by the Shiva startup script).

  • clusters.<clusterId>.plugins

    OPTIONAL

    The list of plugin to install on this specific cluster. Available keywords are "logstash", "spark" and "storm". These names are reference to those defined in the punchplatform-deployment.settings.

    default value: [] (empty JSON array)

  • reporters

    OPTIONAL

    the reporter section lets you define where the shiva workers will send logs related to their scheduled tasks.

    In addition the leader node also publish to that reporter the platform level monitoring status.

    As of today only the elasticsearch reporter is defined. If you omit it, each shiva node will locally log the same information in its local log4j logger.

  • install_dir

    STANDALONE Mandatory

    The path to the shiva setup directory

Spark

Apache Spark is an open-source cluster-computing framework. Spark provides an interface for programming entire clusters with implicit data parallelism and fault-tolerance.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
"spark" : {
      "clusters" : {
          "spark_main": {
              "master" : {
                  "servers" : ["node01"],
                  "listen_interface" : "eth0",
                  "master_port" : 7077,
                  "rest_port": 6066,
                  "ui_port" : 8081
              },
              "slaves" : {
                  "node01" : {
                      "listen_interface" : "eth0",
                      "slave_port" : 7078,
                      "webui_port" : 8084
                  },
                  "node02" : {
                      "listen_interface" : "eth0",
                      "slave_port" : 7078,
                      "webui_port" : 8084
                  },
                  "node03" : {
                      "listen_interface" : "eth0",
                      "slave_port" : 7078,
                      "webui_port" : 8084
                  }
              },
              "spark_workers_by_punchplatform_spark": 1,
              "zk_cluster" : "common",
              "zk_root" : "spark-2.4.0-main",
              "slaves_cpu" : 4,
              "slaves_memory" : "1G"
          }
      },
      "install_dir": "/data/opt/spark-2.4.0-bin-hadoop2.7"
  },
  • clusters.clusterId

    Mandatory: clusterId is a string composed of alphanumeric characters. The clusterId must be unique.

  • clusters.<clusterId>.master

    Mandatory: JSON content containing the spark master settings.

  • cluster.<clusterId>.master.servers

    Mandatory: a list of servers on which a spark master will be installed.

Some issues came from servers hosts, use IP address.

  • cluster.<clusterId>.master.listen_interface

    Mandatory: interface to bind spark master.

  • cluster.<clusterId>.master.master_port

    Mandatory: Integer. TCP port use by Spark master.

  • cluster.<clusterId>.master.rest_port

    Mandatory: Integer. TCP port use by Spark master for application submission.

  • cluster.<clusterId>.master.ui_port

    Mandatory: Integer. TCP port use by the UI of Spark master.

  • clusters.<clusterId>.slaves

    Mandatory: Dictionary indexed by the hostnames of the nodes composing the Spark cluster.

  • clusters.<clusterId>.slaves.nodeHostname.listen_interface

    Mandatory: Network interface on which you want to bind your spark

  • clusters.<clusterId>.slaves.nodeHostname.slave_port

    Mandatory: Integer use by spark slave.

  • clusters.<clusterId>.slaves.nodeHostname.webui_port

    Mandatory: Integer use by UI spark slave.

  • clusters.<clusterId>.spark_workers_by_punchplatform_spark

    Mandatory: Integer. Number of workers by slave.

  • clusters.<clusterId>.zk_cluster

    Mandatory: Id of zookeeper cluster use by Spark Master to be in high-availability.

  • clusters.<clusterId>.zk_root

    Mandatory: String use by spark master to write data for high-availability in zookeeper.

  • clusters.<clusterId>.slaves_cpu

    Mandatory: Integer. Allocation of CPU for each slaves

  • clusters.<clusterId>.slaves_memory

    Mandatory: Integer. Allocation of Memory for each slaves

  • clusters.<clusterId>.metrics

    OPTIONAL

    metrics reporter configuration. At the moment only elasticsearch is supporter.

    Example : metrics.elasticsearch.cluster_id: "es_search"

  • install_dir

    Mandatory: Path to the spark setup directory

Ceph

Ceph is the scalable, distributed objects storage facility used by Punchplatform for archiving, or for delivering CephFS distributed multi-mountable fileystem, or S3-compatible objects storage REST API.

Please note that at the moment, the Punchplatform deployer is not providng automated mean of running the REST API component. This component can be activated on a CEPH admin station (see ceph.admin in punchplatform_deployment.settings) by refering to CEPH documentation of [ceph-rest-api] command and of associated configuration.

The following section is used at runtime by the Punchplatform embedded monitoring system. The full Ceph deployment description is provided in the punchplatform_deployment.settings file.

If this section is not provided, the embedded monitoring system will not monitor ceph resources.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
"ceph" : {
    "clusters" : {
        "main" : {
            "admin" : {
                "http_api_port" : 5050,
                "servers" : [ "node1", "node2" ]
            }
        }
    }
}
  • clusters.<clusterId>.admin.http_api_port

    Mandatory

    Port of the CEPH REST API service on the CEPH admin api nodes. This should match the deployment settings of CEPH admin api nodes from document punchplatform_deployment_settings : ceph.clusters.[cluster_name].admin_rest_apis.[node_name].listening_port

  • clusters.<clusterId>.admin.servers

    Mandatory

    array of server addresses to be used for connecting to the CEPH Admin API nodes, from the Shiva cluster nodes that are allowed to run the punchplatform embedded monitoring function (Shiva leader).

    This should match the deployment settings of CEPH admin api nodes from document deployment_settings : ceph.clusters.[cluster_name].admin_rest_apis.[node_name].listening_address

Elastic Beats

Auditbeat

It is a small component that get system call to

an Elasticsearch cluster:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
"auditbeat" : {
    "reporting_interval" : 30,
    "auditd" : [
        { "hosts" : ["node01"], 
          "audit_rule" : [
            "-w /etc/passwd -p wa -k identity",
            "-a always,exit -F arch=b32 -S open,creat,truncate,ftruncate,openat,open_by_handle_at -F exit=-EPERM -k access" 
          ]
        }
    ],
    "file_integrity" : [
        { "hosts" : ["node01"], "paths" : ["/bin"] },
        { "hosts" : ["node02", "node3"], "paths" : ["/bin", "/usr/bin"], "recursive": true, "exclude_files": ["~$"] }
    ],
    "elasticsearch" : {
        "cluster_id" : "es_search"
    }
}

Or a Kafka cluster:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
"auditbeat" : {
    "reporting_interval" : 30,
    "auditd" : [
        {
          "hosts" : ["node01"],
          "audit_rule" : [
            "-w /etc/passwd -p wa -k identity",
            "-a always,exit -F arch=b32 -S open,creat,truncate,ftruncate,openat,open_by_handle_at -F exit=-EPERM -k access" 
          ]
        }
    ],
    "file_integrity" : [
        { "hosts" : ["node01"], "paths" : ["/bin"] },
        { "hosts" : ["node02", "node3"], "paths" : ["/bin", "/usr/bin"], "recursive": true, "exclude_files": ["~$"] }
    ],
    "kafka" : {
        "cluster_id" : "local"
    }
}
  • reporting_interval (integer)

    The time in seconds between two reports

  • auditd.hosts (string[])

    A list of hosts. Auditbeat will be installed on these servers to execute audit rules.

  • auditd.audit_rule (string)

    A string containing the audit rules that should be installed to the kernel.
    There should be one rule per line.
    Comments can be embedded in the string using # as a prefix.
    The format for rules is the same used by the Linux auditctl utility.
    Auditbeat supports adding file watches (-w) and syscall rules (-a or -A).

  • file_integrity.hosts (string[])

    A list of hosts. Auditbeat will be installed on these servers to check file integrity.

  • file_integrity.paths (string[])

    A list of paths (directories or files) to watch. Globs are not supported.
    The specified paths should exist when the metricset is started.

  • file_integrity.exclude_files (string[])

    A list of regular expressions used to filter out events for unwanted files.
    The expressions are matched against the full path of every file and directory.
    By default, no files are excluded. See Regular expression support for a list of supported regexp patterns.
    It is recommended to wrap regular expressions in single quotation marks to avoid issues with YAML escaping rules.

  • recursive (boolean: false)

    By default, the watches set to the paths specified in paths are not recursive.
    This means that only changes to the contents of this directories are watched.
    If recursive is set to true, the file_integrity module will watch for changes on this directories and all their subdirectories.

Cephbeat

It is a small component that send ceph metrics (number of objects degraded, misplaced, ...) to an Elasticsearch cluster:

1
2
3
4
5
6
7
"cephbeat" : {
    "reporting_interval" : 30,
    "metricsets" : [ "cluster", "pools" ],
    "elasticsearch" : {
        "cluster_id" : "es_search"
    }
}

Or a Kafka cluster:

1
2
3
4
5
6
7
8
"cephbeat" : {
    "reporting_interval" : 30,
    "metricsets" : [ "cluster", "pools" ],
    "kafka" : {
        "cluster_id" : "local",
        "topic_name" : "platform-system-metrics"
    }
}
  • reporting_interval (integer, mandatory)

    Interval in seconds use by cephbeat to report ceph metrics.

  • metricsets (string[], mandatory)

    the metricsets to use

    available values: "cluster" and "pools".

    Their names are explicit: cluster metricset reports metrics about the whole cluster and pool metricset reports metrics for each pool.

  • elasticsearch (map)

    This section enables the elasticsearch reporter

  • elasticsearch.cluster_id (string, mandatory)

    Name of the elasticsearch cluster used to store ceph metrics.

  • kafka (map)

    This section enables the kafka reporter

  • kafka.cluster_id (string, mandatory)

    Name of the kafka cluster.

  • kafka.topic_name (string, mandatory)

    Name of the kafka topic to store metrics from cephbeat

Filebeat

It is a small component that send system logs to an Elasticsearch Cluster:

1
2
3
4
5
6
7
8
9
"filebeat" : {
    "files" : [
        { "hosts" : ["node01"], "path" : ["/var/log/auth.log"] },
        { "hosts" : ["node02"], "path" : ["/var/log/syslog"] }
    ],
    "elasticsearch" : {
        "cluster_id" : "es_search"
    }
}

Or a Kafka cluster:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
"filebeat" : {
    "files" : [
        { "hosts" : ["node01"], "path" : ["/var/log/auth.log"] },
        { "hosts" : ["node02"], "path" : ["/var/log/syslog"] }
    ],
    "kafka" : {
        "cluster_id" : "local",
        "topic_name" : "filebeat-topic"
    }
}
  • files (map[], mandatory)

    This section contains a list of hosts and path to monitor

  • elasticsearch (map)

    This section enables the elasticsearch reporter

  • elasticsearch.cluster_id (string, mandatory)

    Name of the elasticsearch cluster used to store ceph metrics.

  • kafka (map)

    This section enables the kafka reporter

  • kafka.cluster_id (string, mandatory)

    Name of the kafka cluster.

  • kafka.topic_name (string, mandatory)

    Name of the kafka topic to store metrics from filebeat

  • install_dir (string)

    STANDALONE Mandatory

    Path to the filebeat directory containing filebeat [bin/] directory

Metricbeat

It is a small component that send system metrics to:

An Elasticsearch Cluster:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
"metricbeat" : {
    "modules" : {
        "system" : {
            "high_frequency_system_metrics": {
                "metricsets" : ["cpu","load","memory"],
                "reporting_interval" : "30s"
            },
            "normal_frequency_system_metrics": {
                "metricsets" : ["fsstat"],
                "reporting_interval" : "5m"
            },
            "slow_frequency_system_metrics": {
                "metricsets" : ["uptime"],
                "reporting_interval" : "1h"
            }
        }
    },
    "elasticsearch" : {
        "cluster_id" : "es_search"
    }
}

Or a Kafka cluster:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
"metricbeat" : {
    "modules" : {
        "system" : {
            "high_frequency_system_metrics": {
                "metricsets" : ["cpu","load","memory"],
                "reporting_interval" : "30s"
            },
            "normal_frequency_system_metrics": {
                "metricsets" : ["fsstat"],
                "reporting_interval" : "5m"
            },
            "slow_frequency_system_metrics": {
                "metricsets" : ["uptime"],
                "reporting_interval" : "1h"
            }
        }
    },
    "kafka" : {
        "cluster_id" : "local",
        "topic_name" : "platform-system-metrics"
    }
}
  • reporting_interval (integer)

    Interval in seconds use by metricbeat to report system metrics

  • servers (map)

    To monitor external servers by deploying metricbeat, you can provide a list of additional hosts. At the end, you'll deploy metricbeat on all servers which composed the PunchPlatform + additional servers.

  • modules (string, mandatory)

    Names of metricbeat module

  • modules.[module_name] (string, mandatory)

    Name of a dedicated custorm metricbeat metricset

  • modules.[module_name].[metric_name].metricsets (string, mandatory)

    Metricsets of each module To have the full metricset, take a look at the official documentation

  • modules.[module_name].[metric_name].reporting_interval (string, mandatory)

    String containing period between two metricsets collection. For example: 10s, 1m, 1h

  • modules.[module_name].[metric_name].hosts (string)

    Hosts require for some modules such as zookeeper and kafka To have the full metricset, take a look at the official documentation

  • elasticsearch (map)

    This section enabling the elasticsearch reporter

  • elasticsearch.cluster_id (string, mandatory)

    Name of the elasticsearch cluster used to store system metrics

  • kafka (map)

    When present, this section enables the kafka metrics reporter

  • kafka.cluster_id (string, mandatory)

    Name of the kafka cluster

  • kafka.topic_name (string, mandatory)

    Name of the kafka topic to store metrics from metricbeat

  • install_dir (string)

    STANDALONE Mandatory

    path to the metricbeat directory containing metricbeat [bin/] directory, providing metricbeat client shells and binaries, and metricbeat [log/] directory.

Packetbeat

It is a small component that send network packets to

an Elasticsearch cluster:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
"packetbeat" : {
    "reporting_interval" : 30,
    "interfaces" : [
        { "hosts" : ["node01"], "interface" : "eth0" },
        { "hosts" : ["node02"], "interface" : "any" }
    ],
    "elasticsearch" : {
        "cluster_id" : "es_search"
    }
}

Or a Kafka cluster:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
"packetbeat" : {
    "reporting_interval" : 30,
    "interfaces" : [
        { "hosts" : ["node01"], "interface" : "eth0" },
        { "hosts" : ["node02"], "interface" : "any" }
    ],
    "kafka" : {
        "cluster_id" : "local"
    }
}
  • reporting_interval (integer)

    Interval in seconds use by metricbeat to report system metrics

  • elasticsearch (map)

    This section enables the elasticsearch reporter

  • elasticsearch.cluster_id (string, mandatory)

    Name of the elasticsearch cluster used to store system metrics

  • kafka (map)

    This section enables the kafka reporter

  • kafka.cluster_id (string, mandatory)

    Name of the kafka cluster

  • install_dir (string)

    STANDALONE Mandatory

    path to the packetbeat directory containing packetbeat [bin/] directory, providing packetbeat client shells and binaries, and metricbeat [log/] directory.