Skip to content

punchplatform-deployment.settings

Overview

The punchplatform-deployment.settings file is required to deploy a new production PunchPlatform. Declare in there how many Elasticsearch, Kafka, Spark etc.. nodes you need, in turn the punchplatform deployer tool will install everything where needed.

What you find in the punchplatform-deployment.settings file are :

  • the names, versions, hosts, ports and URLS of all inner services (e.g Storm, Zookeeper, ...)
  • the folders where to store softwares, data and logs
  • the unix users in charge of running services or executing administration actions
  • some key configuration parameters (e.g number of Storm workers, jvm xmx, ldap credentials, ...)

This file is required by the PunchPlatform deployer to generate a complete ansible inventory, in turn used to fully deploy your platform.

Important

In this file, when nodes names are provided (as servers lists or dictionaries keys in various clusters), the provided host names will be used for reaching the machines to deploy from the deployment environment, and must therefore be names resolved and reachable from this environment.

When no other specific setting exist to indicate the network interface on which the services will be boud, the nodes hostname may also be used by the cluster framework to communicate with each-other ; therefore they should be resolved as production interface from these machines, to avoid production data flow going through administration networks.

Location

The punchplatform-deployment.settings configuration file must be located in a platforms/<platformName> sub-folder of your deployment configuration directory, where platformName is typically 'production'. A symbolic link named punchplatform-deployment.settings must next be set from the configuration root folder. Remember you have a PUNCHPLATFORM_CONF_DIR environment variable defining that location.

When using the PunchPlatform command-line tools, the PunchPlatform configuration root folder must be provided using the PUNCHPLATFORM_CONF_DIR environment variable. That is, it must look like this:

> $PUNCHPLATFORM_CONF_DIR
    ├── punchplatform-deployment.settings -> platform/singlenode/punchplatform-deployment.settings
    └── platform
        └── singlenode
            └── punchplatform-deployment.settings

Note

In order to deploy a new platform, remember you start by creating a configuration folder on your deployer host. You must then set the [PUNCHPLATFORM_CONF_DIR] environment variable to point to that directory. That variable is expected to be correctly setby the deployer and platform command line tools. Refer to the manual pages

The reason to use a symbolic link is to let you later on switch from one platform to another while keeping the same tenant and channels configuration. It is extremely convenient so as to test your channels on a secondary test platform, and apply it later to your production platform.

After the deployment completes, some of your target servers, the ones acting as administration servers, will be equipped with similar configuration folders. The PUNCHPLATFORM_CONF_DIR environment variable will be set as well on these servers. These folders, usually located under /opt/soc_conf or /data/soc_conf, are actually git clones of a central git repository, and will be used at runtime by the platform to start and/or monitor your channels. All that is setup for you by the deployer. For now keep in mind that you are only defining the folder and files needed for deployment.

Conventions

Best practices

  • File format

    This file a JSON file, in which you are (this is not standard json, though) free to add # prefixed comments.
    Do not forget to add initial brackets ({}) and pay attention to the comma (,) at the end of each properties block and end of file. You can test your json syntax by using sed 's/#.*//g' punchplatform-deployment.settings | jq .

  • Value type

    Remember that in JSON surrounding a number with quotes changes its type from Number to String."

  • Avoid any encoding issue

    To avoid any encoding issue, you should use only upper/lower case non-accentuated alphanumeric characters for all your id, hostname, cluster name, tags and so on.

Documentation conventions

Each component provides an example of configuration that needs to be adapted to your environment.
Follows from the list of parameters.

  • name.of.parameter: Type Default value: DefaultValueIfExist

    Description of the parameter.

If the Type is in bold, the parameter is mandatory when this settings section is present.

Hostname resolution

The hostname value to configure component, like zookeeper or #elasticsearch name must EXACTLY match the result of executing the hostname command on the corresponding servers."

punchplatform-deployment.settings Content

Each section that follows describes one of the punchplatform-deployment.settings part.

Platform

Mandatory section

Each platform is associated with a few configuration properties. These are grouped in a dictionary section. In particular each platform is thus uniquely identified. This identifier appears in turn in metrics tags, typically forwarded from one platform to another.

This section also defines keys location and users to be setup on all your target servers.

"platform": {
    "platform_id" : "punchplatform-primary",
    "setups_root": "/data/opt",
    "remote_data_root_directory": "/data",
    "remote_logs_root_directory": "/var/log/punchplatform",
    "punchplatform_daemons_user": "punchplatform",
    "punchplatform_group": "punchplatform",
    "binaries_version": "punchplatform-binaries-6.1.0-SNAPSHOT",
    "reporters" : ["myreporter"]
}
  • platform_id String

    MANDATORY
    The unique platform identifier. This identifier must be worldwide unique.

  • binaries_version String

    MANDATORY

    version of Punchplatform binaries package

  • setups_root String

    MANDATORY

    root folder where all software packages will be installed machines. It must match the install dirs in punchplatform.properties configuration file.

  • remote_data_root_directory String

    MANDATORY

    The root data directory. That folder will contain elasticsearch zookeeper kafka etc.. data. It must be mounted on a partition with enough disk capacity.

  • remote_logs_root_directory String

    MANDATORY

    The root log folder.

  • punchplatform_daemons_user String

    MANDATORY

    the unix daemon user in charge of running the various platform services. This user is non interactive, and will not be granted a home directory.

  • punchplatform_group String

    MANDATORY

    the user group associated to all users (daemon or operators) setup on your servers.

  • punchplatform_conf_repo_branch String

    OPTIONAL

    By default, the deployer assumes you use a git branch named in your configuration directory. It will clone that branch on the servers defined with a monitoring or administration role. I.e. a role that requires a configuration folder to be installed. Use this property to clone another branch.

  • reporters String[]

    MANDATORY
    A list of reporters used by the operator referenced by id. Ids must be declared in the dedicated 'reporters' section

    For the elasticsearch reporter, logs will be send to the "platform-logs-[yyyy.MM.dd]" index and the metrics to "platform-metrics-[yyyy.MM.dd]".

Ansible Inventory Settings

When you use the punch deployer tool, it may be neccessary to provide additional ansible parameters required by your specific environment. This is the purpose of this section.

    "ansible" : {
        "ansible_inventory_settings" : "[punchplatform_cluster:vars]ansible_ssh_port=8022"
    }
  • ansible_inventory_settings

    Optional: this settings must be used to defined some additional settings for ansible deployment. For instance : ansible_ssh_port, ansible_ssh_user, etc...

Reporters

This section lets you define connectors (called reporters) used by multiple punch components to send important traces, logs and monitoring metrics. Because these connectors have often the same configuration for multiple uses in a platform, they are defined in this dedicated section, and then these reporters can be referred to in the following settings sections, using their id (i.e. the key in this 'reporters' dictionnary):

  • punchplatform_operator section : Every start/stop command will be traced, along with useful information.
  • shiva section : Shiva service will also log its own actions and redirect its child jobs logs.
  • gateway section : As for shiva, the gateway will log its internal informations
"reporters": {
      "central_reporter" : {
        "type": "kafka",
        "bootstrap.servers": "node02:9092",
        "topic": "reporter-topic",
        "metric_document_field_name": "log",
        "reporting_interval": 30,
        "encoding": "lumberjack"
      },
      "debug_reporter" : {
        "type" : "elasticsearch",
         "cluster_name": "es_search"
      }
   },
  • <reporterId> Reporter

    Describes a specific reporter configuration Use the reporter id to use it in other component (Shiva, Platform ..) To see all the available reporters, please refer to the dedicated reporter section.

PunchPlatform Operator

This section drives the deployment of the tools to operate/administrate the platform by a human, or through a Punchplatform Gateway Web API (for some operations, e.g. editing resources through Kibana Punchplatform Plugin) The command-line operation environment can be deployed on an administration server, or on an operator workstation.

"punchplatform_operator" : {
    "punchplatform_operator_environment_version": "punch-operator-6.1.0-SNAPSHOT",
    "configuration_name_dir_from_home" : "pp-conf",
    "operators_username" : ["admin1","admin2"],
    "servers" : {
        "node01" : {}
    },
    "storage": {
      "type": "kafka",
      "kafka_cluster": "local"
    }
},

Important

We strongly recommend to use a git repository to keep safe your PunchPlatform configuration. Take a look at git_settings section

  • configuration_name_dir_from_home

    Mandatory

    Name of the directory which contains the tenants configuration

  • operators_username

    Optional

    In addition to punchplatform_admin_user, all custom users used to administrate the PunchPlatform

  • servers

    Mandatory

    Comma separated array - describe the servers used by operator to administrate the punchplatform. Usually, the servers are workstations.

    Important

    Punchplatform gateways may also act as proxies for operator actions requested through web API (e.g. through Kibana punchplatform plugin). In this case, it is necessary to also deploy operator environment on gateway servers.

  • servers.<server_id>.properties_source_path

    Optional

    punchplatform.properties path on deployer machine the file will be placed inside configuration_name_dir_from_home of each operator user

  • punchplatform_version

    Mandatory

    Version of PunchPlatform

  • punchplatform_conf_repo_git_local_url

    Optional

    Absolute path to access to git bare . Mandatory if the git_settings is not defined - configuration not recommended

  • punchplatform_operator_environment_version

    Mandatory

    Version of the punchplatform operator environment. To start/stop channels/jobs, the punchplatform operator needs several libraries and shell. This operator environment package give all needed scripts and jars.

  • storage.type

    Mandatory

    Describes in which type of storage operator informations will be stored : file (which means that data will be stored on filesystem) or kafka

  • storage.kafka_cluster

    Mandatory (but only present when type is 'kafka'. Identifier of the kafka cluster in which the operator will store its internal management and synchronization data. This must be one of the keys in kafka.clusters dictionary documented previously.

Zookeeper

ZooKeeper is a distributed coordination service used by Storm and Kafka. It is not used by punch components. It exposes its service to client applications as a distributed filesystem.

"zookeeper" : {
    "zookeeper_version" : "apache-zookeeper-3.5.5-bin",
    "zookeeper_nodes_production_interface" : "eth0",
    "zookeeper_childopts" : "-server -Xmx256m -Xms256m",
    "zookeeper_admin_cluster_name": "common",
    "clusters" : {
        "common" : {
            "hosts" : ["node01","node02","node03"],
            "cluster_port" : 2181,
            "punchplatform_root_node" : "/punchplatform-primary"
        }
    }
},
  • zookeeper_version

    MANDATORY
    the zookeeper version

  • zookeeper_admin_cluster_name: String

    MANDATORY
    The zookeeper name of the admin cluster (for instance : common). The admin cluster is the one in which the applicable version of the configuration will be stored (see Administration overview)

  • zookeeper_nodes_production_interface: String

    MANDATORY
    Zookeeper production network interface

  • zookeeper_childopts: String "-server -Xmx1024m -Xms1024m"

    MANDATORY
    JVM options for Zookeeper default "-server -Xmx1024m -Xms1024m"

  • clusters.<clusterId> : String

    MANDATORY
    The clusterId is a string composed of alphanumeric characters and [-]. It is used by PunchPlatform command-line tools and various and configuration files to refer to the corresponding cluster.

    There can be one or several zookeeper.clusters.[clusterId] sections, depending on your platform(s) setup. Multiple clusters are typically used to define several zones with different security levels and data flows restrictions.

    The clusterIds must be unique in the scope of a punchplatform. Note that if you define only one zookeeper cluster in your platform, most PunchPlatform commands will automatically use it as the default cluster, without the need to provide explicit identifiers.

  • clusters.<clusterId>.hosts : String[]

    MANDATORY
    Zookeeper server hostnames part of this zookeeper cluster.

    At least 3 must be provided for resilience only an ODD number can be provided.
    This is a zookeeper requirement to avoid split-brain scenario

    These hostnames are used by PunchPlatform commands that need to find
    an available node in the Zookeeper cluster. This parameter should match the actual list of servers configured in the running zookeeper cluster (see zoo.cfg file in your zookeeper cluster configuration).

  • clusters.<clusterId>.cluster_port : Number

    MANDATORY
    TCP port use by the Zookeeper cluster. I.e. all Zookeeper nodes will bind that port for communicating with client applications as well as to communicate together.

  • clusters.<clusterId>.punchplatform_root_node : String

    MANDATORY
    Defines the Zookeeper root path, starting with /.
    All PunchPlatform Zookeeper data will be stored under this path.

Elasticsearch

Elasticsearch is a document based database. It indexes JSON documents to provide advanced search capabilities. In particular it provides the Kibana frontend applications with their data (business data or metrics data) backends.

"elasticsearch" : {
    "elasticsearch_version" : "7.8.0",
    "clusters" : {
        "es_search" : {
            "nodes" : {
                "node01" : {
                    "http_api_address" : "node01",
                    "transport_address" : "node01",
                    "bind_address" : "_eth1_",
                    "rack_id" : "1"
                },
                "node02" : {
                    "http_api_address" : "node02",
                    "transport_address" : "node02",
                    "bind_address" : "_eth1_",
                    "rack_id" : "2"
                },
                "node03" : {
                    "http_api_address" : "node03",
                    "transport_address" : "node03",
                    "bind_address" : "_eth1_",
                    "rack_id" : "3"
                }
            },
            "http_api_port" : 9200,
            "cluster_production_transport_address" : "node0a",
            "transport_port" : 9300,
            "minimum_master_nodes": 1,
            "settings_by_type" : {
                "data_node": {
                    "max_memory": "2048m",
                    "modsecurity_enabled": false,
                    "modsecurity_blocking_requests": false,
                    "script_execution_authorized": true,
                    "http_cors_enabled" : true,
                    "readonly" : true
                }
            },
            "plugins": {
              "opendistro_security": {
                  "opendistro_security_version": "1.4.0.0",
                  "local_ssl_certs_dir": "/tmp/certs/es",
                  "ssl_transport_pemkey_name": "node-key-pkcs8.pem",
                  "ssl_transport_pemcert_name": "node-cert.pem",
                  "ssl_transport_pemtrustedcas_name": "rootca-cert.pem",
                  "admin_pemcert_name": "admin-cert.pem",                
                  "admin_pemkey_name": "admin-key.pem",                  
                  "admin_pemtrustedcas_name": "cachain.pem",
                  "authcz_admin_dn": ["emailAddress=admin@thalesgroup.com,CN=admin,OU=SAS,O=TS,L=VLZ,ST=Paris,C=FR"],
                  "nodes_dn": ["emailAddress=node@thalesgroup.com,CN=node,OU=SAS,O=TS,L=VLZ,ST=Paris,C=FR"],
                  "kibana_index": ".kibana-domainname"
              }
            }
        }
    }
},
  • elasticsearch_version

    Mandatory : version of Elasticsearch

  • clusters.<clusterId>: String

    MANDATORY
    Alphanumeric characters to uniquely identify and and refer to a given Elasticsearch cluster.

    When the Elasticsearch is configured, that clusterId is also used for generating metrics names.

  • clusters.<clusterId>.nodes.<nodeHostname>: String

    MANDATORY
    Hostnames of the nodes composing the Elasticsearch cluster.

  • clusters.<clusterId>.nodes.<nodeHostname>.http_api_address: String

    MANDATORY
    FQDN (domain) or IP of the REST API provided by the Elasticsearch node on the production network.

  • clusters.<clusterId>.nodes.<nodeHostname>.transport_address: String

    FQDN (domain) or IP of the internal Elasticsearch communication port exposed on the production network (reachable by the other nodes of the Elasticsearch cluster).

    This parameter is used for PunchPlatform channel deployment, when using transport protocol for documents indexation, in order to directly send data from storm topologies to the cluster data nodes.

  • clusters.<clusterId>.nodes.<nodeHostname>.bind_address: String

    Default: "ES interface" When provided, this parameter defines the network address(es) to which the elasticsearch node will bind (and therefore
    listen for incoming request). This can be provided in all forms supported by the elasticsearch host parameter.

    If not provided, the bind address will be determined using default
    elasticsearch production interface provided in the deployment.settings.

  • clusters.<clusterId>.nodes.<nodeHostname>.type: String

    Default: "data_node" The type of the node: data_node, master_node, client_node or only_data_node (ie data node without master role enabled).

    This parameter is used for PunchPlatform cluster deployment, in order to automatically configure the Elasticsearch cluster nodes to set the data and master directive in the elasticsearch configuration file.

  • clusters.<clusterId>.nodes.<nodeHostname>.tag: String

    Default: ""
    Nodes can be tagged. It allows indices placement, i.e an index can be placed on a particular set of nodes according to their tags. It uses node.box_type Elasticsearch property.

    It can be used into Hot/Warm architectures for example: Index/tags mapping has to be declared on Elasticsearch mappings (or mapping templates in PunchPlatform) in parameter settings.index.routing.allocation.require.box_type.

  • clusters.<clusterId>.nodes.additional_jvm_options: String

    Default: "" Add JVM options for a specific elasticsearch node. Overrides the additional_jvm_options parameter set in cluster section

  • clusters.<clusterId>.http_api_port: Integer

    MANDATORY
    Listening port number of the REST API provided by the elasticsearch node on the production network

  • clusters.<clusterId>.cluster_production_transport_address: String

    MANDATORY
    Network address or interface name of the cluster when using transport protocol (to be used with transport_port parameter).
    This address will be used for example for metrics-containing clusters, to allow access by the channel health monitoring service of the PunchPlatform admin server.

  • clusters.<clusterId>.api_hosts_for_monitoring: String[]

    Optional array of string of the form "host:port" can be provided when the platform monitoring daemon (shiva cluster) is not able to directly reach the Elasticsearch API using the hosts and ports from "nodes" settings.
    This is the case if this shiva cluster is running in a separate admin area, with no routing to the Elasticsearch cluster production network interface.

    e.g.

    "api_hosts_for_monitoring" : [ "myelasticsearchvip.admin.network:9200"]
    

  • clusters.<clusterId>.transport_port: Integer

    MANDATORY Listening port number of the internal Elasticsearch communication port exposed on the production network (reachable by the other nodes of the Elasticsearch cluster).

  • clusters.<clusterId>.minimum_master_nodes: Integer

    MANDATORY Define settings to prevent the by configuring the majority of nodes (total = (n/2)+1)

  • clusters.<clusterId>.recover_after_nodes: Integer

    Default: 0 Recover as long as this many data or master nodes have joined the cluster.

  • clusters.<clusterId>.expected_nodes: Integer

    Default: 0 The number of (data or master) nodes that are expected to be in the cluster. Recovery of local shards will start as soon as the expected number of nodes have joined the cluster.

  • clusters.<clusterId>.recover_after_time: String "5m"

    If the expected number of nodes is not achieved, the recovery process waits for the configured amount of time before trying to recover regardless. Defaults to 5m if one of the expected_nodes settings is configured.

  • clusters.<clusterId>.additional_jvm_options: String

    Default: "" Add JVM options to each node from elasticsearch cluster

  • clusters.<clusterId>.settings_by_type: Object

    define settings for data_nodes

  • clusters.<clusterId>.settings_by_type.client_type: String "data_node"

    Refer to Elasticsearch type. By default it is data_node. It can be client_node

  • clusters.<clusterId>.settings_by_type.client_type.max_memory String

    Maximum size of each Elasticsearch nodes Jvm memory. Rule of thumb is half the size of VM RAM, assuming one elasticsearch server per VM. Should be kept below 32G to avoid the JVM to have to use large size pointers and memory tables.

  • clusters.<clusterId>.settings_by_type.client_type.modsecurity_enabled: Boolean

    Enable (true) or disable (false) the installation and the configuration of modsecurity

  • clusters.<clusterId>.settings_by_type.client_type.modsecurity_blocking_requests: Boolean true

    During the integration of a PunchPlatform, this setting sets the modsecurity in NO BLOCKED mode.

  • clusters.<clusterId>.settings_by_type.client_type.script_execution_authorized: Boolean

    Enable (true) or disable (false) the execution of script through elasticsearch. This settings must be set to true to display all grafana dashboards properly. We recommend to set this setting to false for customer client elasticsearch cluster for security purpose.

  • clusters.<clusterId>.settings_by_type.client_type.http_cors_enabled: Boolean

    Enable (true) or disable (false) cross-origin resource sharing, i.e. whether a browser on another origin can do requests to Elasticsearch

  • clusters.<clusterId>.settings_by_type.client_type.readonly: Boolean

    Enable (true) or disable (false) readonly modsecurity. It will deny search, visualization and dashboard creation for the user.

  • clusters.<clusterId>.override_elasticsearch_version: String

    In some cases, especially after a migration of elasticsearch with snapshot mechanism, you want to switch the
    elasticsearch version for only one cluster. Usually the query one.

  • clusters.<clusterId>.supervisor: Undefined

    Elasticsearch nodes are supervised by supervisor. Its logrotate parameters can be configured in this section.

Modsecurity

Modsecurity is an Apache module to protect deletion and integrity against your Elasticsearch cluster.

{
    "elasticsearch": {
        "modsecurity" : {
            "modsecurity_production_interface" : "eth0",
            "port" : 9100,
            "domains" : {
                "admin": {
                    "elasticsearch_security_aliases_pattern": "events-mytenant-kibana-[-a-zA-Z0-9.*_:]+",
                    "elasticsearch_security_index_pattern": "events-mytenant-[-a-zA-Z0-9.*_:]+"
                }
            }
        }
    }
}
  • modsecurity.modsecurity_production_interface

    Mandatory: interface used by modsecurity on the target host.

  • modsecurity.port

    Mandatory: port uses by apache for modsecurity

  • modsecurity.<domain_name>

    Mandatory: name of the client. Please check the kibana name's

  • modsecurity.<client_name>.elasticsearch_security_index_pattern

    Mandatory

    regexp on the name of the index for modsecurity configuration. This parameter is used to restrict requests for access to data. The purpose is to prevent any access to other indexes than the user profile that accesses this specific kibana domain/instance is entitled to.

    This parameter MUST match all indexes that contain data allowed to the user, not only aliases which names the user 'sees' in the Kibana interface. For example, if the kibana provides an 'index pattern' that in fact is an alias (e.g. : events-mytenant-kibana-bluecoat-lastmonth), the pattern must match underlying indexes that contain the data (e.g. : events-mytenant-bluecoat-2017.07.05 ).

    This is because Kibana will determine which indexes contain useful data within a 'user level' alias, and will issue unitary requests to only the underlying indexes that hold data matching the query time scope.

    To configure what aliases the user is allowed to see/uses at his Graphical User Interface level, please provide a different value to the 'elasticsearch_security_aliases_pattern'.

    If non-wildcard index patterns are used in Kibana, then this setting MUST also match the said index patterns, which will be queried 'directly' by kibana, without making any difference between indexes and aliases. Example : if a user has authorized data in indexes named following the'events-mytenant--' pattern, but sees them only through aliases named following the 'events-mytenant-kibana-', then the setting should be : TODO

    To authorize everything please fill TODO

  • modsecurity.<client_name>.elasticsearch_security_aliases_pattern

    Optional

    Regexp on the name of the user-level aliases for modsecurity configuration. This setting MUST be provided if the user is allowed only to select some aliases within his kibana instance, instead of actually using indexes pattern that match real unitary indexes names.

    If this setting is not provided, then it will default to the 'elasticsearch_security_index_pattern' setting value, and may lead to kibana malfunction or Elasticsearch overuse, especially if the provided value to this other setting is in fact an aliases pattern.

    If you want to force kibana to use pre-flight requests to determine actual low-level indexes useful to query against a time-scope, then kibana indexes pattern must contain a '' and therefore, this setting should enforce presence of a ''.

    Example : if a user has authorized data in indexes named following the'events- mytenant--' pattern, but sees them only through aliases named following the events-mytenant-kibana-<technoname>, then the setting should be : events-mytenant-kibana-[-.:0-9a-zA-Z*_]*[*][-.:0-9a-zA-Z*_]*.

    To authorize everything please fill TODO

!!!! info "Do not forget to edit your Elasticsearch section in punchplatform.properties to enabled Punch Security"

{
    "elasticsearch" : {
        "clusters" : {
            "es_search" : {
                "settings_by_type" : {
                    "data_node": {
                        "modsecurity_enabled": true,
                        "modsecurity_blocking_requests": true
                    }
                }
            }
        }
    }
}

Plugins

  • clusters.<clusterId>.plugins: Object

    Elasticsearch plugins.

Opendistro Security
  • clusters.<clusterId>.plugins.opendistro_security.opendistro_security_version

    Mandatory : version of Opendistro Security plugin for Elasticsearch. Trigger the plugin installation during Elasticsearch deployment.

  • clusters.<clusterId>.plugins.opendistro_security: Object

    Elasticsearch Opendistro Security plugin configuration.

  • clusters.<clusterId>.plugins.opendistro_security.local_ssl_certs_dir: Mandatory, String

    Directory located on the deployer's system containing all the SSL keys and certificates that will be used by Opendistro Security for Elasticsearch to encrypt ES transport protocol.

  • clusters.<clusterId>.plugins.opendistro_security.ssl_transport_pemkey_name: Mandatory, String

    Certificate key file name with extension for nodes. Must be different from admin's certificate key file and must be provided with PKCS8 format. Example: node-key-pkcs8.pem

  • clusters.<clusterId>.plugins.opendistro_security.ssl_transport_pemcert_name: Mandatory, String

    Certificate file name with extension for nodes. Must be different from admin's certificate file. Example: node-cert.pem

  • clusters.<clusterId>.plugins.opendistro_security.ssl_transport_pemtrustedcas_name: Mandatory, String

    Trusted Certificate Authority file name with extension for nodes. Must be different from admin's trusted certificate authority file. Example : rootca-cert.pem

  • clusters.<clusterId>.plugins.opendistro_security.admin_pemkey_name: Mandatory, String

    Certificate key file name with extension for security administration. Must be different from node's certificate key files and must be provided with PKCS8 format. Example: admin-key-pkcs8.pem

  • clusters.<clusterId>.plugins.opendistro_security.admin_pemcert_name: Mandatory, String

    Certificate file name with extension for security administration. Must be different from node's certificate files. Example: admin-cert.pem

  • clusters.<clusterId>.plugins.opendistro_security.admin_pemtrustedcas_name: Mandatory, String

    Trusted Certificate Authority file name with extension for security administration. Can be different the same node's trusted certificate authority file. Example : rootca-cert.pem

  • clusters.<clusterId>.plugins.opendistro_security.authcz_admin_dn: Mandatory, String[]

    Distinguished Name (DN) for admin client authentication and corresponding to admin certificate's DN. Used to update security configurations. Example ["emailAddress=admin@thalesgroup.com,CN=admin,OU=SAS,O=TS,L=VLZ,ST=Paris,C=FR"]

  • clusters.<clusterId>.plugins.opendistro_security.nodes_dn: Mandatory, String[]

    Distinguished Name (DN) for nodes and corresponding to nodes certificate's DN. Example ["emailAddress=node@thalesgroup.com,CN=node,OU=SAS,O=TS,L=VLZ,ST=Paris,C=FR"]

  • clusters.<clusterId>.plugins.opendistro_security.ssl_http_enabled: Optional, Boolean

    Default false. If true, enabled SSL encryption for Elasticsearch's Rest API.

  • clusters.<clusterId>.plugins.opendistro_security.ssl_http_pemkey_name: Optional, String

    Certificate key file name with extension for https. Example: rest-key-pkcs8.pem

  • clusters.<clusterId>.plugins.opendistro_security.ssl_http_pemcert_name: Optional, String

    Certificate file name with extension for https. Example: rest-cert.pem

  • clusters.<clusterId>.plugins.opendistro_security.ssl_http_pemtrustedcas_name: Optional, String

    Trusted Certificate Authority file name with extension for https. Example : rootca-cert.pem

  • clusters.<clusterId>.plugins.opendistro_security.ssl_http_clientauth_mode: Optional, String

    Default OPTIONAL. Value is NONE. OPTIONAL or REQUIRE. Authentication mode for https.

  • clusters.<clusterId>.plugins.opendistro_security.kibana_index: String, Optional

    Must match the name of the Kibana index from kibana.yml. Default is .kibana . Example .kibana-admin

Info

Deploy Open Distro security on your Elasticsearch cluster will configure the security features once. Any further configuration on your filesystem requires a manual action to reload the security measures all over the cluster.

Opendistro Alerting
  • clusters.<clusterId>.plugins.opendistro_alerting.opendistro_alerting_version

    Mandatory : version of Opendistro Alerting plugin for Elasticsearch. Trigger the plugin installation during Elasticsearch deployment.

Kibana

It is a front end application that allow an user to search and display data from Elasticsearch.

{
  "kibana": {
    "kibana_version" : "7.8.0",
    "repository": "http://fr.archive.ubuntu.com/ubuntu/",
    "domains": {
      "admin": {
        "es_cluster_target": "es_search",
        "es_type_of_nodes_targeted": "data_node",
        "kibana_port": 5601,
        "type": "administration",
        "index": ".kibana-override-name",
        "plugins": {
          "punchplatform": {
            "punchplatform_version": "6.0.0",
            "rest_api": {
              "hosts": ["http://server1:4242/v1"]
            }
          }
        }
      }
    },
    "servers": {
      "node01": {
        "address": "0.0.0.0"
      },
      "node02": {
        "address": "node02"
      },
      "node03": {
        "address": "node03"
      }
    }
  }
}
  • kibana_version

    Mandatory : version of Kibana

  • repository:

    Optional, Boolean : but mandatory if chrooted

  • domains.<domainName>: String

    Comma-separated array which defined each instance of kibana

  • domains.<domainName>.es_cluster_target: String

    Cluster name that the kibana is allowed to contact

  • domains.<domainName>.es_type_of_nodes_targeted: String

    Elasticsearch node type that the kibana is allowed to contact

  • domains.<domainName>.kibana_port: Boolean

    TCP port used to access to kibana on HTTP

  • domains.<domainName>.type: String "external_customer"

    The default value is external_customer. Value : which connects the kibana through the front reverse proxy and the other value which connects the kibana through the admin proxy (and can also be directly accessed from) network

  • domains.<domainName>.servers: String

    List of hostnames, no ip. It allow to start kibana instances only on the described hosts. To deploy properly this feature, you have to: 1) Filled the kibana.servers section with ALL kibana servers 2) Select few servers in the previous section to start selected domains on these servers. Must be filled on kibana.domains..servers

  • domains.<domainName>.index: String, Optional

    Override kibana index name used to store visualizations. Default is : kibana-<domain_name>

  • servers.<serverName>: String

    Comma-separated array of several parameters of each server

  • servers.<serverName>.address: String

    Interface used to bind the kibana process

  • chrooted: Boolean false

    Set to true to enabled this function. Only taken into account if there is only one- domain specified. If set to false, the running instances of Kibana won't be jailed in a chroot. Recommended in production.

  • plugins: JSON object

    Refer to the following sections that describe kibana plugins configuration

Kibana plugins

Kibana Punchplatform Feedback plugin

  • kibana.plugins.punchplatform_feedback.punchplatform_feedback_version

    Mandatory : version of Punchplatform Feedback plugin for Kibana. Trigger the plugin installation during Kibana deployment.

Kibana Opendistro Security plugin

  • kibana.plugins.opendistro_security.opendistro_security_version

    Mandatory : version of Opendistro Security plugin for Kibana. Trigger the plugin installation during Kibana deployment.

Kibana Opendistro Alerting plugin

  • kibana.plugins.opendistro_alerting.opendistro_alerting_version

    Mandatory : version of Opendistro Alerting plugin for Kibana. Trigger the plugin installation during Kibana deployment.

Kibana Punch plugin

Info

Punchplatform plugin for Kibana can be configured in two different sections :

  • kibana.plugins.punchplatform : global plugin's configuration for all domains
  • kibana.domains.<domain_name>.plugins.punchplatform : local configuration applied to this domain, overriding global configuration. All global configurations in kibana.plugins.punchplatform can be overridden in this section
  • kibana.plugins.punchplatform.punchplatform_version

    Mandatory : version of Punchplatform plugin for Kibana. Trigger the plugin installation during Kibana deployment.

  • kibana.plugins.punchplatform.rest_api.hosts : Mandatory, String array

    Backend REST API for Punch plugin

  • kibana.plugins.punchplatform.rest_api.base_path : Optional, String

    Define a base path to insert as a prefix in all requests path leading to the backend REST API

  • kibana.plugins.punchplatform.rest_api.request_timeout : Optional, Integer

    Timeout in ms for requests leading to the backend REST API

  • kibana.plugins.punchplatform.rest_api.custom_headers : Optional, String array

    Add custom headers in requests leading to the backend REST API

  • kibana.plugins.punchplatform.documentation_enabled : Optional, Boolean

    Default true. Enables documentation component

  • kibana.plugins.punchplatform.tools_enabled : Optional, Boolean

    Default true. Enables Punch tools component

  • kibana.plugins.punchplatform.extraction_enabled Optional, Boolean

    Default true. Enables Data extraction component

  • kibana.plugins.punchplatform.extraction_max_size Optional, Integer

    Default 1000000

  • kibana.plugins.punchplatform.extraction_outputs Optional, String array

    Default ['csv', 'json']. Formats output

  • kibana.plugins.punchplatform.punchline_enabled Optional, Boolean

    Default true. Enables Punchline editor component

  • kibana.plugins.punchplatform.platform_editor_enabled Optional, Boolean

    Default true. Enables Platform studio component

  • kibana.plugins.punchplatform.platform_editor_channel_management Optional, Boolean

    Default true. Enables Channel management for Platform studio component

SSL and authentication to the backend REST API are also configurable :

  • kibana.plugins.punchplatform.rest_api.username : Optional, String

    Define a username to authenticate to backend services through the backend REST API. This username is inserted inside the header "Authorization"

  • kibana.plugins.punchplatform.rest_api.password : Optional, String

    Define a password to authenticate to backend services through the backend REST API. This username is inserted inside the header "Authorization"

  • kibana.plugins.punchplatform.rest_api.ssl.enabled : Optional, Boolean

    Default true if the section ssl is define. Enable SSL encryption for the connection to the backend REST API

  • kibana.plugins.punchplatform.rest_api.ssl.use_kibana_certificate : Optional, Boolean

    Default false if the section ssl is define. If true, all configuration concerning any SSL certificate is not needed and the plugin will use the Kibana certs

  • kibana.plugins.punchplatform.rest_api.ssl.certificate : Optional, String

    SSL certificate file path for connection encryption to the backend REST API

  • kibana.plugins.punchplatform.rest_api.ssl.key : Optional, String

    SSL key file path for connection encryption to the backend REST API

  • kibana.plugins.punchplatform.rest_api.ssl.certificate_authorities : Optional, String array

    SSL CA files path for connection encryption to the backend REST API

  • kibana.plugins.punchplatform.rest_api.ssl.verification_mode : Optional, String

    none or full

SSL/TLS

Info

For SSL configuration, all key names MUST refer to the names inside local_ssl_certs_dir

{
  "kibana": {
    "domains": {
      "admin": {
        "local_ssl_certs_dir": "/tmp/certs/kibana/admin",
        "server_ssl_enabled": true,
        "server_ssl_key_name": "kibana-server-key.pem",
        "server_ssl_certificate_name": "kibana-server-cert.pem",
        "elasticsearch_ssl_enabled": true,
        "elasticsearch_ssl_verificationMode": "none",
        "elasticsearch_ssl_certificateAuthorities_names": ["server-cachain.pem"]
      }   
    } 
  }
}
  • kibana.domains.<domainName>.local_ssl_certs_dir: Mandatory, String

    Directory located in the deployer's system containing all the SSL keys and certificates that will be used by Kibana's current domain

  • kibana.domains.<domainName>.server_ssl_enabled: Optional, Boolean

    Default false. If true, enable ssl/tls for requests to kibana server

  • kibana.domains.<domainName>.server_ssl_key_name: Optional, String

    Name of the kibana server's key

  • kibana.domains.<domainName>.server_ssl_certificate_name: Optional, String

    Name of the kibana server's key

  • kibana.domains.<domainName>.elasticsearch_ssl_enabled: Optional, Boolean

    Default false. If true, enable ssl/tls for requests to Elasticsearch server

  • kibana.domains.<domainName>.elasticsearch_ssl_verificationMode: Optional, String

    Default full. Certificate's verification mode for requests to Elasticsearch server. Either none, certificate or full (default)

  • kibana.domains.<domainName>.elasticsearch_ssl_certificateAuthorities_names: Optional, Array

    Name of the CA files for trusted certificates, used for requests to Elasticsearch servers

Storm

"storm" : {
    "storm_version" : "apache-storm-2.1.0",
    "storm_nimbus_nodes_production_interface" : "eth0",
    "clusters" : {
        "main": {
            "master" : {
                "servers" : ["node01", "node02", "node03"],
                "cluster_production_address" : "node0a"
                },
            "ui" : {
                "servers" : ["node01", "node02", "node03"],
                "cluster_admin_url": "node0a:8080"
            },
            "slaves" : ["node01", "node02", "node03"],
            "zk_cluster" : "common",
            "zk_root" : "storm-1.2.2-main",
            "storm_workers_by_punchplatform_supervisor" : 15,
            "workers_childopts" : "-Xmx1G",
            "supervisor_memory_mb" : 8192,
            "supervisor_cpu" : 4
        }
    }
},
  • storm_version

    Mandatory: version of storm

  • storm_nimbus_nodes_production_interface

    Mandatory: network interface bound by storm nimbus (master) for production usage

  • storm_nimbus_jvm_xmx

    Optional

    Set the Xmx of the nimbus jvm default value: 1024m

  • storm_ui_jvm_xmx

    Optional

    Set the Xmx of the ui jvm default value: 256m

  • storm_supervisor_jvm_xmx

    Optional

    Set the Xmx of the storm supervisor jvm default value: 256m

  • clusters.<clusterId>: String

    The Storm cluster identifier. A string composed of letters and numbers. A single Punchplatform can contain several storm clusters.

  • clusters.<clusterId>.master.servers: String[]

    A comma-separated array of hostnames of the servers that will run Storm so-called nimbus processes in charge of scheduling the starting/stopping of topologies.

  • clusters.<clusterId>.master.thrift_port: Integer 6627

    The thrift TCP Port used for storm inter communication. Default value: 6627

  • clusters.<clusterId>.ui.servers: String[]

    A comma-separated array of hostnames of the servers that will run the Storm so-called ui server, providing inbuilt monitoring Wwb interface and an associated REST API.

  • clusters.<clusterId>.ui.ui_port: Integer 8080

    Rhe listening TCP Port of the Storm ui servers. Default value: 8080

  • clusters.<clusterId>.slaves: String[]

    A comma-separated array of hostnames of the servers that will run Storm so-called supervisor processes in charge of starting/stopping topologies, as requested by the nimbus.

  • clusters.<clusterId>.zk_cluster: String

    Identifier of the zookeeper cluster in which the Storm cluster will store its internal cluster management and synchronization data. This must be one of the keys in zookeeper.clusters dictionary documented previously.

  • clusters.<clusterId>.zk_root: String

    This string is a prefix (composed of letters, digits or '-') that is used as root of all data path in the zookeeper cluster, for data associated to the storm cluster. This allows sharing a same zookeeper cluster for multiple Storm cluster ; therefore it should be unique within a zookeeper cluster (both unique within the PunchPlatform system, but also unique as compared to other zookeeper roots configured in other PunchPlatform for the same zookeeper cluster). We recommend to add the storm version to avoid issue during migration

  • clusters.<clusterId>.storm_workers_by_punchplatform_supervisor: Integer

    This number indicates the number of Storm slave slots
    that are allowed on each storm slave node (i.e. running Storm supervisor component). This field, multiplied by the JVM memory options of each slave (see workers_childopts field hereafter) should not exceed Storm slave server memory.

  • clusters.<clusterId>.workers_childopts: String

    This string provides the storm worker jvm options. If an
    empty string is provided, then default Storm settings will be
    applied to these VMs. Storm worker are in charge of running topologies.
    This field, multiplied by the number of Storm Slots (see storm_workers_by_punchplatform_supervisor field above)
    should not exceed Storm slave server memory.

  • clusters.<clusterId>.supervisor_memory_mb: Integer

    This number provides the size of RAM of the virtual or physical node. It is used to configure storm.yaml for storm supervisor.

  • clusters.<clusterId>.supervisor_cpu: Integer

    This number provides the number of CPU of the virtual or physical node. It is used to configure storm.yaml for storm supervisor.

  • clusters.<clusterId>.master.monitoring_interval: Integer 60

    The period (in seconds) of cyclical acquisition of metrics/supervisor status by Nimbus. Legacy deployments use 10, but increasing this value reduces the load of the nimbus service and improve availability of the Storm UI/API.

  • clusters.<clusterId>.master.monitoring_interval: Integer 60

    The period (in seconds) of cyclical acquisition of metrics/supervisor status by Nimbus. Legacy deployments use 10, but increasing this value reduces the load of the nimbus service and improve availability of the Storm UI/API.

  • clusters.<clusterId>.master.supervisor_timeout: Integer 90

    The timeout (in seconds) for declaring a supervisor non-nominal when nimbus monitors it. Legacy settings is 10s. Increasing this value in relation with 'monitoring_interval' helps avoiding false-positives of failed supervisors in loaded situation. As a tradeoff, the reassignment of topologies to a surviving supervisor node in case of loss of the supervisor node previously assigned these topologies will take longer.

  • clusters.<clusterId>.supervisor: Undefined

    Storm components are supervised by supervisor. Its logrotate parameters can be configured in this section.

  • clusters.<clusterId>.published_storm_hostname_source: String

    The following setting determine the name that the storm will publish in zookeeper, so that other nodes can contact this one So this MUST be a name resolved to the production interface of this node, when resolved on other cluster nodes. It can take different values : "inventory" , storm will publish the hostname set in configurations files as production interface. "server_local_fqdn" , storm will publish the local server fqdn hostname as production interface. "server_local_hostname", storm will publish the local server hostname as production interface. "auto", settings by default, storm will choose which hostname it will publish in zookeeper

Kafka

It is a resilient and scalable queueing application. It stores documents for several days. Usually, it is used before storm to keep safe data

"kafka" : {
    "kafka_version" : "kafka_2.11-2.4.0",
    "kafka_brokers_production_interface" : "eth0",
    "clusters" : {
        "local" : {
            "brokers_with_ids" : [
              {"id" : 1, "broker" : "node01:9092" },
              {"id" : 2, "broker" : "node02:9092" },
              {"id" : 3, "broker" : "node03:9092" }
            ],
            "zk_cluster" : "common",
            "zk_root" : "kafka-local",
            "brokers_config" : "punchplatform-local-server.properties",
            "default_replication_factor" : 1,
            "default_partitions" : 2,
            "partition_retention_bytes" : 1073741824,
            "partition_retention_hours" : 24,
            "kafka_brokers_jvm_xmx": "512M"
        }
    }
}
  • kafka_version

    Mandatory

    version of kafka

  • kafka_brokers_production_interface

    Mandatory

    network interface bound by kafka broker for production usage

  • clusters.<clusterId>: String

    clusterId is a string composed of alphanumeric characters and [_] which will be used each time this particular kafka cluster must be identified in a PunchPlatform command-line or configuration file, and also for metrics name generation when elasticsearch reporting is activated by PunchPlatform configuration.

    There can be one or multiple kafka.clusters.<clusterId> sections, depending on the overall deployment configuration (for example : in order to use different storage configuration for brokers that manage different kind of logs, or to ensure isolation of performance between different log channels). Kafka clusterIds must be unique in a PunchPlatform cluster.

    Please note that if only one kafka cluster is identified in punchplatform properties file, most PunchPlatform commands will automatically use this kafka cluster without need for clusterId providing on command line.

  • clusters.<clusterId>.brokers_with_ids: map[]

    Pairs of id and broker providing all kafka brokers in this cluster and their unique id.
    [ {"id" : 1, "broker" : "node01:9092" }, {"id" : 2, "broker" : "node02:9092" }, {"id" : 3, "broker" : "node03:9092" } ],
    When redeploying on existing nodes, the id should be preserved to avoid data loss. Therefore if migrating from the deprecated 'brokers' settings (with autogenerated id), please fetch previously deployed id from your kafka node (broker.id setting in your cluster configuration, usually in /data/opt/kafka*/conf/punchplatform-<KafkaclusterId>-serer.properties).

  • clusters.<clusterId>.zk_cluster: String

    String identifying the PunchPlatform zookeeper cluster that this kafka cluster will use to persist/exchange its internal configuration, topics, partitions and offsets. This must be one of the keys in zookeeper.clusters dictionary documented previously.

    This parameter will be used by all PunchPlatform kafka clients (producers and consumers) that will need to locate available kafka brokers for this cluster, because available clusters register themselves in zookeeper.

  • clusters.<clusterId>.zk_root: String

    This string is a prefix (composed of letters, digits or '-') that is used as root of all data path in the zookeeper cluster, for data associated to the kafka brokers cluster. This allows sharing a same zookeeper cluster for multiple Kafka brokers clusters ; therefore it should be unique within a zookeeper cluster (both unique within the PunchPlatform system, but also unique as compared to other zookeeper roots configured in other PunchPlatform for the same zookeeper cluster).

  • clusters.<clusterId>.brokers_config: String

    Path to the local kafka broker server configuration. This parameter is used by punchplatform-standalone.sh and punchplatform-kafka.sh when running a local kafka broker server in a PunchPlatform sample configuration. When using punchplatform cluster deployment tool, this field is used to generate the Kafka brokers cluster configuration on Kafka servers.

  • clusters.<clusterId>.default_replication_factor: Integer

    Default replication level for Kafka topic partitions. This is used whenever no replication factor is defined in
    the channel structure configuration (cf. (Channels).
    A number of 1 means no replication, therefore no resilience in case
    of failure of a cluster broker.

  • clusters.<clusterId>.default_partitions: Integer

    Number of default Kafka topics number of partitions for
    each topic partition, whenever no partitions number is defined in
    channel structure configuration (cf. Channels).

    Number of partitions allow to scale processing by sharding
    responsibility of consuming Kafka messages between multiple consumer
    instances (if configured in Storm topology

  • clusters.<clusterId>.partition_retention_bytes: Long

    Maximum size-based retention policy for logs. kafka applies the first condition to delete (either time or size) so this parameter is a failsafe to avoid that any single channel fills up the platform storage in case of flooding of a topic.

    In a typical cluster setup, we limit each channel PARTITION to: -1000 EventsPerSeconds x 1000bytes x 2 day for , so a typical value of 172800000000 (bytes). - 4000 logs per second for 2 days of flooding, and 1 day of additional nominal storage (2500 lps) with 3000 bytes per enriched log, therefore 1099511627776 for topology for a tenant

  • clusters.<clusterId>.partition_retention_hours: Integer

    maximum time-based retention policy (if size-based retention policy is not triggered by data amount received)

  • clusters.<clusterId>.offsets_retention_minutes: Integer

    Default is 20160. Offsets older than this retention period will be discarded.

  • clusters.<clusterId>.offsets_retention_minutes: Integer

    Default is 20160. Offsets older than this retention period will be discarded.

  • clusters.<clusterId>.kafka_brokers_jvm_xmx: Integer

    The max size allowed to each kafka broker JVM (this will be used by the kafka startup script).

  • clusters.<clusterId>.supervisor: Undefined

    Kafka nodes are supervised by supervisord. Its logrotate parameters can be configured in this section.

Shiva

Shiva is the distributed, resilient jobs/services manager used for tasks both at PunchPlatform system level (monitoring, housekeeping...) and at user processing level (channels).

Shiva is made of nodes communicating through a kafka cluster. Nodes can be leaders (masters of the cluster) or runners (tasks executors) or both. The operator commands will be available on PunchPlatform operators linux accounts

"shiva": {
    "shiva_version": "punchplatform-shiva-6.1.0-SNAPSHOT",
    "clusters": {
        "common": {
            "reporters": ["myreporter"],
            "storage":{
                "type": "kafka",
                "kafka_cluster": "common"
            },
            "servers": {
                "localhost": {
                    "runner": true,
                    "can_be_master": true,
                    "tags": []
                }
            }
        }
    }
}
  • shiva_version

    Mandatory: Version of shiva app to deploy. File located in archives.

  • clusters.<clusterId>.reporters String[]

    MANDATORY
    A list of reporters used by shiva referenced by id. Ids must be declared in the dedicated 'reporters' section

  • clusters.<clusterId>.storage.type

    Mandatory

    Describes in which type of storage operator informations will be stored : file (which means that data will be stored on filesystem) or kafka

  • clusters.<clusterId>.storage.kafka_cluster

    Mandatory (but only present when type is 'kafka'. Identifier of the kafka cluster in which the operator will store its internal management and synchronization data. This must be one of the keys in kafka.clusters dictionary documented previously.

  • clusters.<clusterId>.servers.<serverName>

    Mandatory

    For each shiva node to be deployed, section containing the configuration of the node. The server name is used for resolving the administration interface of the shiva node from the deployment machine.

  • clusters.<clusterId>.servers.<serverName>.runner

    Mandatory

    Boolean indicating if this shiva node will have the 'runner' role. Runners are in charge of executing locally tasks assigned to them by the leader (active master).

  • clusters.<clusterId>.servers.<serverName>.can_be_master

    Mandatory

    Boolean indicating if this shiva node can become the leader of the cluster. The leader is in charge of assigning tasks to an appropriate node (given current balancing of tasks among available runners that match the the task tags requirements.

    If no leader is available, runners will keep executing their assigned services, but no resilience is possible in case of a runner shutdown, and no new task or periodic job execution will occur).

  • clusters.<clusterId>.servers.<serverName>.tags

    OPTIONAL

    List of comma-separated tags strings. This is useful only for worker nodes. Tags are user-defined information strings associated to each node.

    When submitting a task to the Shiva cluster, the user can specify a tags. This allows for tasks placement depending on user needs such as network areas, pre-installed modules required for running the task, etc..

    Default value: ansible_hostname

  • clusters.<clusterId>.shiva_cluster_jvm_xmx

    OPTIONAL

    The max size allowed to each Shiva node JVM (used by the Shiva startup script).

Gateway

The Punchplatform Gateway is a Rest service used to redirect requests to other Rest services or to backend services, such as Punch channels management or PML. It can therefore act as a "proxy" operator, for example acting on behalf of a user connected to a Kibana Punchplatform Plugin.

It can also be used to apply a security layer over these services, providing safe authentication to standard backends (OpenLDAP, AD, ...), a tenant based access control and SSL connections.

Each cluster of gateways is deployed for serving requests on behalf of only one tenant. This allows to provide/allow different level of available actions for different tenants of the platform, and to ensure inte-tenant isolation by limiting access to this tenant configuration only.

{
  "gateway":{
    "gateway_version": "6.1.0",
    "clusters":{
      "mycluster": {
        "tenant": "mytenant",
        "servers": {
          "server1": {
            "inet_address": "server1",
            "port": 4242
          }
        },
        "elasticsearch": {
          "data_cluster": {
            "cluster_id": "es_search",
            "hosts":["server1:9200"]
          },
          "metric_cluster": {
            "cluster_id": "es_metrics",
            "hosts": ["server2:9200"],
            "index": "mytenant-metrics"
          }
        },
        "services": {
          "extraction": {
            "formats": ["csv", "json"]
          }
        },
        "resources": {
          "doc_dir": "/data/doc",
          "archives_dir": "/data/archives",
          "manager": {                                             
            "metadata": [                                          
              {                                                    
                "type": "elasticsearch",
                "hosts": [                                         
                  "server2:9200"                                   
                ],                                                 
                "index": "resources-metadata"                     
              }                                                    
            ],                                                     
            "data": [                                              
              {                                                    
                "type": "file",                                    
                "root_path": "/tmp/punchplatform/resources"
              }  
            ]                                                                                                                         
          }

        },
        "reporters": ["myreporter"]
      }
    }
  }
}
  • gateway_version

    Mandatory: Version of gateway app to deploy. File located in archives.

  • gateway.clusters.[clusterId] : Json section

    Mandatory: clusterId is a string composed of alphanumeric characters. The clusterId must be unique.

  • gateway.clusters.[clusterId].tenant: String

    Mandatory : Tenant name affected to the cluster. The cluster will internally use the Elasticsearch cluster services, the Spark cluster services and the Zookeeper cluster services provided by the tenant configuration

  • gateway.clusters.[clusterId].modsecurity_enbabled: Boolean

    Optional : if true, an apache server is deployed alongside the Gateway with a modsecurity module

  • gateway.clusters.[clusterId].channel_management_enabled: Boolean

    Optional : Default true. Requires PUNCHPLATFORM_CONF_DIR and PUNCHPLATFORM_INSTALL_DIR environment configurations. If true, enable channel management route

  • gateway.clusters.[clusterId].puncher_enabled: Boolean

    Optional : Default true. Requires and PUNCHPLATFORM_INSTALL_DIR environment configuration. If true, enable puncher route

  • gateway.clusters.[clusterId].punchline_enabled: Boolean

    Optional : Default true. Requires PUNCHPLATFORM_CONF_DIR and PUNCHPLATFORM_INSTALL_DIR environment configurations. If true, enable punchline route

  • gateway.clusters.[clusterId].servers.[serverId]: Json section

    Mandatory : server host name where the gateway must be installed

    Important

    If operator actions are provided by the gateway API (cf. 'channel_management_enabled', 'puncher_enabled', 'punchline_enabled') then the punchplatform operator environment must also be configured to be deployed on these gateway servers (see Deployment settings).

  • gateway.clusters.[clusterId].servers.[serverId].inet_address: String

    Mandatory : ip of the interface address where the gateway server should be deployed

  • gateway.clusters.[clusterId].servers.[serverId].port: Integer

    Mandatory : Port number the gateway server will use to listen on the interface

  • gateway.clusters.[clusterId].elasticsearch : Json section

    Optional : Json section referring to the elasticsearch platform's cluster

  • gateway.clusters.[clusterId].elasticsearch.data_cluster : Json section

    Optional : Json section referring to the cluster where the data are sent

  • gateway.clusters.[clusterId].elasticsearch.data_cluster.hosts : String array

    Mandatory : List of the data cluster's hosts. Pattern is host:port

  • gateway.clusters.[clusterId].elasticsearch.data_cluster.prefix : String

    Optional : If set, the targeted Elasticsearch Rest API address will be modified with a path prefix for every sent requests. As an example for a local cluster, set a prefix my/path will send the ES requests to localhost:9200/my/path/{client_path}

  • gateway.clusters.[clusterId].elasticsearch.data_cluster.settings : String array

    Optional : List of additional elasticsearch settings to configure for the data cluster. Pattern is key:value

  • gateway.clusters.[clusterId].elasticsearch.data_cluster.credentials.user : String

    Optional : Username used by the gateway's REST client to connect to the ES cluster

  • gateway.clusters.[clusterId].elasticsearch.data_cluster.credentials.password : String

    Optional : Password used by the gateway's REST client to connect to the ES cluster

  • gateway.clusters.[clusterId].elasticsearch.data_cluster.ssl_enbaled : Boolean

    Optional : Set it to true if the data_cluster requires SSL connections

  • gateway.clusters.[clusterId].elasticsearch.data_cluster.ssl_trusted_certificate : String

    Optional : Path to the CA file used by the gateway's REST client to connect to the ES cluster with TLS

  • gateway.clusters.[clusterId].elasticsearch.data_cluster.ssl_private_key : String

    Optional : Path to the private key file used by the gateway's REST client to connect to the ES cluster with TLS. Must be in pkcs8 format.

  • gateway.clusters.[clusterId].elasticsearch.data_cluster.ssl_certificate : String

    Optional : Path to the public key file used by the gateway's REST client to connect to the ES cluster with TLS

  • gateway.clusters.[clusterId].elasticsearch.metric_cluster : Json section

    Mandatory : Json section referring to the cluster where all the gateway's metrics are sent

  • gateway.clusters.[clusterId].elasticsearch.metric_cluster.index_name : String

    Mandatory : name of the index where the metrics are sent

  • gateway.clusters.[clusterId].elasticsearch.metric_cluster.hosts : String array

    Mandatory : List of the metric cluster's hosts. Pattern is host:port

  • gateway.clusters.[clusterId].elasticsearch.metric_cluster.prefix : String

    Optional : If set, the targeted Elasticsearch Rest API address will be modified with a path prefix for every sent requests. As an example for a local cluster, set a prefix my/path will send the ES requests to localhost:9200/my/path/{client_path}

  • gateway.clusters.[clusterId].elasticsearch.metric_cluster.settings : String array

    Optional : List of additional elasticsearch settings to configure for the metric cluster. Pattern is key:value

  • gateway.clusters.[clusterId].elasticsearch.metric_cluster.credentials.user : String

    Optional : Username used by the gateway's REST client to connect to the ES cluster

  • gateway.clusters.[clusterId].elasticsearch.metric_cluster.credentials.password : String

    Optional : Password used by the gateway's REST client to connect to the ES cluster

  • gateway.clusters.[clusterId].elasticsearch.metric_cluster.ssl_enbaled : Boolean

    Optional : Set it to true if the metric_cluster requires SSL connections

  • gateway.clusters.[clusterId].elasticsearch.metric_cluster.ssl_trusted_certificate : String

    Optional : Path to the CA file used by the gateway's REST client to connect to the ES cluster with TLS

  • gateway.clusters.[clusterId].elasticsearch.metric_cluster.ssl_private_key : String

    Optional : Path to the private key file used by the gateway's REST client to connect to the ES cluster with TLS. Must be in pkcs8 format.

  • gateway.clusters.[clusterId].elasticsearch.metric_cluster.ssl_certificate : String

    Optional : Path to the public key file used by the gateway's REST client to connect to the ES cluster with TLS

  • gateway.clusters.[clusterId].services.extraction.formats : String array > Optional : Supported output formats for extraction. Either csv or json.

  • gateway.clusters.[clusterId].resources.doc_dir : String array > Optional : Documentation location on gateway's server

  • gateway.clusters.[clusterId].resources.archives_dir : String array > Mandatory : Archives storage location on gateway's server for archiving and extraction service

  • gateway.clusters.[clusterId].reporters String[]

    MANDATORY
    A list of reporters used by gateway referenced by id. Ids must be declared in the dedicated 'reporters' section

Resource Manager

The resource manager is composed with 2 lists :

  • Metadata list section for multiple metadata backends
  • Data list section for multiple data storage backends

  • gateway.clusters.[clusterId].resources.manager.metadata.type : String > Mandatory : Metadata backend type. Only elasticsearch is actually supported

  • gateway.clusters.[clusterId].resources.manager.data.type : String > Mandatory : Data storage type. Only file is actually supported

If the metadata backend type is elasticsearch :

  • gateway.clusters.[clusterId].resources.manager.metadata.hosts : String array > Mandatory : String list of ES hosts

  • gateway.clusters.[clusterId].resources.manager.metadata.prefix : String

    Optional : If set, the targeted Elasticsearch Rest API address for metadata will be modified with a path prefix for every sent requests. As an example for a local cluster, set a prefix my/path will send the ES requests to localhost:9200/my/path/{client_path}

  • gateway.clusters.[clusterId].resources.manager.metadata.index : String > Mandatory : Index name where the metadata will be stored as json documents

  • gateway.clusters.[clusterId].resources.manager.metadata.credentials.user : String > Optional : username if the connexion to ES requires a user to authenticate

  • gateway.clusters.[clusterId].resources.manager.metadata.credentials.password : String > Optional : password if the connexion to ES requires a secret to authenticate

  • gateway.clusters.[clusterId].resources.manager.metadata.ssl_enabled : Boolean > Optional : Default is false. True if the connexion to ES is enciphered with TLS

  • gateway.clusters.[clusterId].resources.manager.metadata.ssl_trusted_certificate : String > Optional : Path to a CA file to use if SSL is enabled

  • gateway.clusters.[clusterId].resources.manager.metadata.ssl_certificate : String > Optional : Path to a public key file to use if SSL is enabled

  • gateway.clusters.[clusterId].resources.manager.metadata.ssl_private_key : String > Optional : Path to a private key file to use if SSL is enabled

If the data storage type is file :

  • gateway.clusters.[clusterId].resources.manager.data.root_path : String > Mandatory : Path to store the data. The gateway service MUST have the proper write access permissions to this path

SSL

{
  "gateway": {
    "gateway_version": "6.1.0",
    "clusters": {
      "cluster1": {
        "tenant": "mytenant",
        "servers": {
          "server1": {
            "inet_address": "172.28.128.21",
            "port": 4242,
            "ssl": {
              "enabled": true,
              "local_key_store_path": "/tmp/jks/gateway.keystore",
              "key_store_type": "jks",
              "key_store_password": "gateway",
              "key_alias": "gateway",
              "key_password": "gateway"
            }
          }
        }
      }
    }
  }
}
  • gateway.clusters.[clusterId].servers.[serverId].ssl: Json section

    Optional : Enable SSL connection from any client to the current server

  • gateway.clusters.[clusterId].servers.[serverId].ssl.key_store_path: String

    Mandatory : Path to the truststore file containing the server's certificates

  • gateway.clusters.[clusterId].servers.[serverId].ssl.key_store_type: String

    Mandatory : Type of the truststore. Either jks or p12 (pkcs12)

  • gateway.clusters.[clusterId].servers.[serverId].ssl.key_store_password: String

    Mandatory : Password protecting the truststore

  • gateway.clusters.[clusterId].servers.[serverId].ssl.key_alias: String

    Mandatory : Alias of the server's certificate inside the truststore

  • gateway.clusters.[clusterId].servers.[serverId].ssl.key_password: String

    Mandatory : Password protecting the server's key

Spark

Apache Spark is an open-source cluster-computing framework. Spark provides an interface for programming entire clusters with implicit data parallelism and fault-tolerance.

{
"spark" : {
    "spark_version" : "spark-2.4.3-bin-hadoop2.7",
    "punchplatform_analytics_deployment_version" : "punchplatform-analytics-deployment-6.1.0-SNAPSHOT",
      "clusters" : {
          "spark_main": {
              "master" : {
                  "servers" : ["node01"],
                  "listen_interface" : "eth0",
                  "master_port" : 7077,
                  "rest_port": 6066,
                  "ui_port" : 8081
              },
              "slaves" : {
                  "node01" : {
                      "listen_interface" : "eth0",
                      "slave_port" : 7078,
                      "webui_port" : 8084
                  },
                  "node02" : {
                      "listen_interface" : "eth0",
                      "slave_port" : 7078,
                      "webui_port" : 8084
                  },
                  "node03" : {
                      "listen_interface" : "eth0",
                      "slave_port" : 7078,
                      "webui_port" : 8084
                  }
              },
              "spark_workers_by_punchplatform_spark": 1,
              "zk_cluster" : "common",
              "zk_root" : "spark-2.4.0-main",
              "slaves_cpu" : 4,
              "slaves_memory" : "1G"
          }
      }
    },
}
  • spark_version

    Mandatory: version of apache spark

  • punchplatform_analytics_deployment_version

    Mandatory: version of PML

  • clusters.clusterId

    Mandatory: clusterId is a string composed of alphanumeric characters. The clusterId must be unique.

  • clusters.<clusterId>.master

    Mandatory: JSON content containing the spark master settings.

  • cluster.<clusterId>.master.servers

    Mandatory: a list of servers on which a spark master will be installed.

Some issues came from servers hosts, use IP address.

  • cluster.<clusterId>.master.listen_interface

    Mandatory: interface to bind spark master.

  • cluster.<clusterId>.master.master_port

    Mandatory: Integer. TCP port use by Spark master.

  • cluster.<clusterId>.master.rest_port

    Mandatory: Integer. TCP port use by Spark master for application submission.

  • cluster.<clusterId>.master.ui_port

    Mandatory: Integer. TCP port use by the UI of Spark master.

  • clusters.<clusterId>.slaves

    Mandatory: Dictionary indexed by the hostnames of the nodes composing the Spark cluster.

  • clusters.<clusterId>.slaves.nodeHostname.listen_interface

    Mandatory: Network interface on which you want to bind your spark

  • clusters.<clusterId>.slaves.nodeHostname.slave_port

    Mandatory: Integer use by spark slave.

  • clusters.<clusterId>.slaves.nodeHostname.webui_port

    Mandatory: Integer use by UI spark slave.

  • clusters.<clusterId>.spark_workers_by_punchplatform_spark

    Mandatory: Integer. Number of workers by slave.

  • clusters.<clusterId>.zk_cluster

    Mandatory: Id of zookeeper cluster use by Spark Master to be in high-availability.

  • clusters.<clusterId>.zk_root

    Mandatory: String use by spark master to write data for high-availability in zookeeper.

  • clusters.<clusterId>.slaves_cpu

    Mandatory: Integer. Allocation of CPU for each slaves

  • clusters.<clusterId>.slaves_memory

    Mandatory: Integer. Allocation of Memory for each slaves

  • clusters.<clusterId>.metrics

    OPTIONAL

    metrics reporter configuration. At the moment only elasticsearch is supporter.

    Example : metrics.elasticsearch.cluster_id: "es_search"

Ceph

Ceph is the scalable, distributed objects storage facility used by Punchplatform for archiving, or for delivering CephFS distributed multi-mountable filesystem, or S3-compatible objects storage REST API.

Please note that at the moment, the Punchplatform deployer is not providing automated mean of running the REST API component. This component can be activated on a CEPH admin station (see ceph.admin)) by referring to CEPH documentation of [ceph-rest-api] command and of associated configuration.

It is used to deploy a Ceph cluster (a distributed storage system) and archive data.

{
"ceph" : 
{ 
"version": "13.2.5",
"clusters": {
 "main": {
   "production_network": "192.168.0.0/24",
   "fsid": "b5ee2a02-b92c-4829-8d43-0eb17314c0f6",
   "storm_clusters_clients": [
     "main"
   ],
   "osd_min_bind_port": 6800,
   "osd_max_bind_port": 6803,
   "mgr_min_bind_port": 6810,
   "mgr_max_bind_port": 6813,
   "erasure_coding_profile": {
     "k": 2,
     "m": 1
   },
   "pools": {
     "mytenant-data": {
       "type": "erasure-coded",
       "pg_num": 128,
       "pgp_num": 128
     },
     "mytenant-fsmeta": {
       "type": "replicated",
       "pg_num": 32,
       "pgp_num": 32,
       "replication_factor": 2
     },
     "mytenant-fsdata": {
       "type": "erasure-coded",
       "pg_num": 32,
       "pgp_num": 32
     }
   },
   "filesystems": {
     "myfs": {
       "metadata_pool": "mytenant-fsmeta",
       "data_pool": "mytenant-fsdata"
     }
   },
   "admins": [
     "node01",
     "node02"
   ],
   "admin_rest_apis": {
     "node02": {
       "listening_address": "node02",
       "listening_port": 5050
     },
     "node03": {
       "listening_address": "node03",
       "listening_port": 5050
     }
   },
   "monitors": {
     "node01": {
       "id": 0,
       "production_address": "node01"
     },
     "node02": {
       "id": 1,
       "production_address": "node02"
     },
     "node03": {
       "id": 2,
       "production_address": "node03"
     }
   },
   "osds": {
     "node01": {
       "id": 0,
       "device": "/dev/sdb",
       "device_type": "disk",
       "crush_device_class": "hdd",
       "production_address": "node01"
     },
     "node02": [
       {
         "id": 0,
         "device": "/dev/sdb",
         "device_type": "disk",
         "crush_device_class": "ssd",
         "production_address": "node02",
         "osd_min_bind_port": 6800,
         "osd_max_bind_port": 6803
       },
       {
         "id": 101,
         "device": "/dev/sdc",
         "device_type": "disk",
         "crush_device_class": "hdd",
         "production_address": "node02",
         "osd_min_bind_port": 6850,
         "osd_max_bind_port": 6857
       }
     ],
     "node03": {
       "id": 2,
       "device": "/dev/sdb",
       "device_type": "disk",
       "crush_device_class": "hdd",
       "production_address": "node03"
     }
   },
   "managers": {
     "node01": {
       "id": 0
     },
     "node02": {
       "id": 1
     },
     "node03": {
       "id": 2
     }
   },
   "metadataservers": {
     "node01": {
       "id": 0,
       "production_address": "node01"
     },
     "node02": {
       "id": 1,
       "production_address": "node02"
     }
   }
 }
}
}
}

Warning

Ceph package must be installed on deployment machine, it can be performed using additional packages provided in deployer.

  • version

    Mandatory: specify the ceph version.

  • clusters

    Mandatory: You can use several Ceph clusters, depending on your needs. Declare clusters here.

  • clusters.<cluster_name>

    Mandatory: The name of your Ceph cluster.

  • clusters.<cluster_name>.production_network

    Mandatory: Production network, used by Ceph clients to communicate with storage servers and monitors.

  • clusters.<cluster_name>.transport_network

    Optional: Transport network, used by Ceph storage servers to ensure data replication and heartbeat traffic. By default the transport network is the production network.

  • clusters.<cluster_name>.fsid

    Mandatory: Unique Ceph cluster ID.

  • clusters.<cluster_name>.storm_clusters_clients

    Mandatory: Specify here names of Storm clusters (specified in punchplatform.properties configuration file). All slaves nodes of the Storm clusters will be the clients of the Ceph cluster.

  • clusters.<cluster_name>.osd_min_bind_port

    Optional

    OSD (data nodes) bind on one to four ports between 6800 and 7300. This default range can be overridden by specifying a min port (and a max port in next field). Default value is 6800. This must of course differ from other daemons. If you have multiple OSDs on a single node (see 'osds' setting section) then this specific parameter should be set inside each of the osd section, to ensure that the multiple OSDs of the node have different port ranges

  • clusters.<cluster_name>.osd_max_bind_port

    Optional

    OSD (data nodes) bind on one to four ports between 6800 and 7300. This default range can be overridden by specifying a max port (and a min port in previous field). Default value is 7300. This must of course differ from other daemons. If you have multiple OSDs on a single node (see 'osds' setting section) then this specific parameter should be set inside each of the osd section, to ensure that the multiple OSDs of the node have different port ranges

  • clusters.<cluster_name>.mgr_min_bind_port

    OPTIONAL

    managers nodes bind on one port between 6800 and 7300.  This default range can be overridden by specifying a min port (and a max port in next field). Default value is 6800.

  • clusters.<cluster_name>.mgr_max_bind_port

    Optional

    This default range can be overridden by specifying a max port (and a min port in previous field).

    Default value is 7300.

  • clusters.<cluster_name>.erasure_coding_profile

    Optional

    Erasure coding profile used by all erasure coded pools can be specified in this section.

  • clusters.<cluster_name>.erasure_coding_profile.k

    Mandatory

    IN erasure_coding_profile SECTION: k value is the number of data chunks. See Ceph section for more details. Be careful when specifying this parameter. Default value is (NumberOf(OSD) - 1).

  • clusters.<cluster_name>.erasure_coding_profile.m

    Mandatory IN erasure_coding_profile SECTION

    k value is the number of data erasure codes. It represents the number of tolerant nodes loss. See Ceph section for more details. Be careful when specifying this parameter. Default value is 1.

  • clusters.<cluster_name>.pools

    Mandatory

    Dictionary that specify data pools that should exist and be accessible by Ceph clients from PunchPlatform storm topologies. Typically, one data pool can be declared by tenant to facilitate isolation and easy purge of a tenant if needed. Each key in the dictionary is the name of the pool.

  • clusters.<cluster_name>.pools.<pool_name>.type

    Mandatory

    type of pool resilience : either (which mean either a non-resilient pool, or resilience will be achieved through multiple storage) or (which means resilience will be achieved through a RAID-like algorithm). For Ceph-fs filesystem metadata only 'replicated' value is supported. Note that to achieve actual resilience, when using the 'replicated' value, you need additionally to provide a 'replication_factor' of at least 2.

  • clusters.<cluster_name>.pools.<pool_name>.replication_factor

    Mandatory (but only present when type is 'replicated'. This is the total number of data replica (i.e. a value of '1' means 'non resilient'). This value may be changed afterwards to increase/reduce resilience.

  • clusters.<cluster_name>.pools.<pool_name>.pg_num

    Optional

    number of Placement Groups (aggregates of objects in a pool). Default value is 128.

  • clusters.<cluster_name>.pools.<pool_name>.pgp_num

    Optional: number of PGP. Default value is 128.

  • clusters.<cluster_name>.filesystems

    Mandatory

    Dictionary that specify cephFS filesystems that should exist and be accessible by Ceph clients from PunchPlatform storm topologies. Typically, a filesystem can be declared by tenant to facilitate isolation and easy purge of a tenant if needed. Each key in the dictionary is the name of the filesystem.

  • clusters.<cluster_name>.filesystems.<filesystem_name>.metadata_pool

    Mandatory

    name of a ceph pool that will store directory structure/files metadata information about the CephFS filesystem. This must be a pool of 'replicated' type.

  • clusters.<cluster_name>.filesystems.<filesystem_name>.data_pool

    Mandatory

    name of a ceph pool that will store files content of the filesystem. In current PunchPlatform release, this must be a pool of 'replicated' type.

  • clusters.<cluster_name>.admins

    Mandatory

    Array of nodes names hosting Ceph Admin nodes. These nodes will hold a copy of the ceph cluster administration keyring, and of ceph tools used for the command-line administration of the cluster.

  • clusters.<cluster_name>.admin_rest_apis

    Mandatory

    Dictionary that specify the nodes that will run the ceph admin rest api daemon. This API will then be usable for monitoring the cluster status, either by direct invocation through a web browser, or by the Punchplatform embedded monitoring system. Keys of this dictionary must be reachable host names from the deployer node.

  • clusters.<cluster_name>.admin_rest_apis.<node_name>.listening_address

    Mandatory: Binding address on which the rest api daemon will be listening.

  • clusters.<cluster_name>.admin_rest_apis.<node_name>.listening_address

    Mandatory: Binding port on which the rest api daemon will be listening.

  • clusters.<cluster_name>.monitors

    Mandatory: Monitors maintain the cluster map (OSD endpoints, etc). A

  • clusters.<cluster_name>.monitors.<node_name>

    Mandatory: Names of monitor nodes.

  • clusters.<cluster_name>.monitors.<node_name>.id

    MANDATORY

    Unique ID of monitor. This ID must be unique relative to the cluster of monitor nodes (an OSD could have the same ID in the same cluster)

  • clusters.<cluster_name>.monitors.<node_name>.production_address

    Mandatory: Monitors bind this address to listen requests from clients.

  • clusters.<cluster_name>.osds

    Mandatory: OSD (Object Storage Node) host the data.

  • clusters.<cluster_name>.osds.<node_name>

    Mandatory: Name of osd node. This is an array of json dictionary, each one describing an OSD daemon running on the host, and managing one block device for data storage

  • clusters.<cluster_name>.osds.<node_name>[].id

    Mandatory: IDs have to be unique in the OSD cluster (a monitor could have the same ID in the same cluster).

  • clusters.<cluster_name>.osds.<node_name>[].device

    Mandatory: Specify the device on the OSD where data is stored. This can be a disk device or a logical volume device.

  • clusters.<cluster_name>.osds.<node_name>[].crush_device_class

    Optional

    This is a service class tag than can be used to mark the node in the Ceph crush placement tree. This can then be used for placement rules. Default value is 'None', but it is advised to provide either , depending on the actual device type. Note that this value is used only by punchplatform deployer at OSD node creation time ; if you want to change this information afterwards, please refer to standard ceph tools for updating osd device class in crush table.

  • clusters.<cluster_name>.osds.<node_name>[].production_address

    Mandatory

    The production address which is the endpoint used by Ceph clients to get or put data.

  • clusters.<cluster_name>.osds.<node_name>[].transport_address

    Optional

    The transport address which is used internally for data replication and heartbeat traffic. By default the transport address is the production address.

  • clusters.<cluster_name>.osds.<node_name>[].initial_weight

    Optional

    The relative weight of this storage node when deciding to store data chunks. Nominal (default) value is 1.0, which is the same as other nodes. A weight of 0.0 means NO data will be stored on this node. This value is useful when inserting a new node in an existing cluster, to avoid immediate total rebalancing ; it is also useful when clearing data from a node to prepare removal of the node. This parameter is used only by PunchPlatform cluster deployer, when creating a new OSD. To change the osd weight after deployment, please refer to official CEPH documentation or this howto.

  • clusters.<cluster_name>.osds.<node_name>[].osd_min_bind_port

    Optional: OSD (data nodes) bind on one to four ports between 6800 and 7300. This default range can be overridden by specifying a min port (and a max port in next setting). Default value is the value provided by the same setting at

  • cluster level.

    This must of course differ from other daemons. If you have multiple OSDs on a single node (see 'osds' setting section) then this specific parameter should be set inside each of the individual osd section, to ensure that the multiple OSDs of the node have different port ranges

  • clusters.<cluster_name>.osds.<node_name>[].osd_max_bind_port

    Optional

    OSD (data nodes) bind on one to four ports between 6800 and 7300. This default range can be overridden by specifying a max port (and a min port in previous setting). Default value is the value provided by the same setting at cluster level. This must of course differ from other daemons. If you have multiple OSDs on a single node (see 'osds' setting section) then this specific parameter should be set inside each of the individual osd section, to ensure that the multiple OSDs of the node have different port ranges

  • clusters.<cluster_name>.managers

    Mandatory: Managers provide additional monitoring and interfaces to external monitoring and management systems. They're usually collocated with monitors.

  • clusters.<cluster_name>.managers.<node_name>.id

    Mandatory

    IDs have to be unique in the managers cluster (a monitor or an could have the same ID in the same cluster).

  • clusters.<cluster_name>.metadataservers

    Optional: MDS (Metadata server) : manages the structure and metadata required for CephFS filesystem instances. At least one MDS is needed for the CephFS feature activation. At least 2 must be defined for high-availability of the Ceph FS feature.

  • clusters.<cluster_name>.metadataservers.<node_name>

    Mandatory: Name of mds node. This is the host on which the feature will be deployed.

  • clusters.<cluster_name>.metadataservers.<node_name>.id

    Mandatory: IDs have to be unique in the MDS cluster (usually, first one is 0)

  • clusters.<cluster_name>.metadataservers.<node_name>.production_address

    Mandatory: The production address which is the endpoint used by CephFS clients to reach the MDS node.

Clickhouse

Clickhouse is a column-oriented database for online analytical processing of queries.

"clickhouse": {
    "clickhouse_version": "20.4.6.53",
    "clusters": {
      "common": {
        "shards": [
          {
            "servers": [
              "server1"
            ]
          }
        ],
        "zk_cluster": "common",
        "zk_root": "clickhouse",
        "http_port": 8123,
        "tcp_port": 9100
      }
    }
  }
  • clickhouse_version

    Mandatory: specify the Clickhouse version.

  • clusters.clusterId

    Mandatory: clusterId is a string composed of alphanumeric characters. The clusterId must be unique.

  • clusters.<clusterId>.shards

    Mandatory : A list of Clickhouse shard.

  • clusters.<clusterId>.shards.servers

    Mandatory : A list of Clickhouse replicates.

  • clusters.<clusterId>.http_port

    Mandatory : HTTP Port used by Clickhouse.

  • clusters.<clusterId>.tcp_port

    Mandatory : TCP Port used by Clickhouse.

  • clusters.<clusterId>.zk_cluster

    Zookeeper cluster used only if replication is enabled.

  • clusters.<clusterId>.zk_root

    Zookeeper root directory.

SSL certificates

This role is used to spread SSL certificates across the platform from a source directory where the certificates are stored.

The certificates will be copied inside the dedicated directory {setups_root}/certificates, for each configured server.

The setups_root value is the configuration inside the platform section. Check this out for further information.

{
  "certificates": {
    "server1": {
      "local_certificates_dir": "/root/certs/server1"
    },
    "server2": {
      "local_certificates_dir": "/root/certs/server2"
    }
  }
}
  • local_certificates_dir

    Location inside the deployer machine, where the server's certificates are initially stored

Minio

It is used to deploy a Minio cluster (a distributed storage system) and archive data.

 "minio": {
  "minio_version": "<RELEASE_VERSION>",
  "minio_access_key": "admin",
  "minio_secret_key": "punchplatform",
  "minio_public_cert": "<PATH_TO_PUBLIC_CRT>",
  "minio_private_key": "<PATH_TO_PRIVATE_KEY>"
    "clusters": {
      "common": {
        "hosts": [
          "server1"
        ],
        "port": "9000"
      }
    }
  }
  • minio_version

    Mandatory: specify the Minio version.

  • minio_access_key

    Mandatory: Access Key to login on Minio.

  • minio_secret_key

    Mandatory: Secret Key to login on Minio.

  • minio_public_cert

    Path to the public cert for Minio.

  • minio_private_key

    Path to the private key for Minio.

  • clusters.clusterId

    Mandatory: clusterId is a string composed of alphanumeric characters. The clusterId must be unique.

  • clusters.<clusterId>.hosts

    A list of hosts. Minio will be installed on these servers.

  • clusters.<clusterId>.port

    Port used by Minio.

Elastic Beats

Auditbeat

It is a small component that get system call to

an Elasticsearch cluster:

"auditbeat" : {
    "auditbeat_version" : "7.8.0",
    "reporting_interval" : 30,
    "auditd" : [
        { "hosts" : ["node01"], 
          "audit_rule" : [
            "-w /etc/passwd -p wa -k identity",
            "-a always,exit -F arch=b32 -S open,creat,truncate,ftruncate,openat,open_by_handle_at -F exit=-EPERM -k access" 
          ]
        }
    ],
    "file_integrity" : [
        { "hosts" : ["node01"], "paths" : ["/bin"] },
        { "hosts" : ["node02", "node3"], "paths" : ["/bin", "/usr/bin"], "recursive": true, "exclude_files": ["~$"] }
    ],
    "elasticsearch" : {
        "cluster_id" : "es_search"
    }
}

Or a Kafka cluster:

"auditbeat" : {
    "auditbeat_version" : "7.8.0",
    "reporting_interval" : 30,
    "auditd" : [
        {
          "hosts" : ["node01"],
          "audit_rule" : [
            "-w /etc/passwd -p wa -k identity",
            "-a always,exit -F arch=b32 -S open,creat,truncate,ftruncate,openat,open_by_handle_at -F exit=-EPERM -k access" 
          ]
        }
    ],
    "file_integrity" : [
        { "hosts" : ["node01"], "paths" : ["/bin"] },
        { "hosts" : ["node02", "node3"], "paths" : ["/bin", "/usr/bin"], "recursive": true, "exclude_files": ["~$"] }
    ],
    "kafka" : {
        "cluster_id" : "local"
    }
}
  • auditbeat_version

    Mandatory: version of auditbeat

  • reporting_interval (integer)

    The time in seconds between two reports

  • auditd.hosts (string[])

    A list of hosts. Auditbeat will be installed on these servers to execute audit rules.

  • auditd.audit_rule (string)

    A string containing the audit rules that should be installed to the kernel.
    There should be one rule per line.
    Comments can be embedded in the string using # as a prefix.
    The format for rules is the same used by the Linux auditctl utility.
    Auditbeat supports adding file watches (-w) and syscall rules (-a or -A).

  • file_integrity.hosts (string[])

    A list of hosts. Auditbeat will be installed on these servers to check file integrity.

  • file_integrity.paths (string[])

    A list of paths (directories or files) to watch. Globs are not supported.
    The specified paths should exist when the metricset is started.

  • file_integrity.exclude_files (string[])

    A list of regular expressions used to filter out events for unwanted files.
    The expressions are matched against the full path of every file and directory.
    By default, no files are excluded. See Regular expression support for a list of supported regexp patterns.
    It is recommended to wrap regular expressions in single quotation marks to avoid issues with YAML escaping rules.

  • recursive (boolean: false)

    By default, the watches set to the paths specified in paths are not recursive.
    This means that only changes to the contents of this directories are watched.
    If recursive is set to true, the file_integrity module will watch for changes on this directories and all their subdirectories.

Filebeat

It is a small component that send system logs to an Elasticsearch Cluster:

"filebeat" : {
    "filebeat_version" : "7.8.0",
    "files" : [
        { "hosts" : ["node01"], "path" : ["/var/log/auth.log"] },
        { "hosts" : ["node02"], "path" : ["/var/log/syslog"] }
    ],
    "elasticsearch" : {
        "cluster_id" : "es_search"
    }
}

Or a Kafka cluster:

"filebeat" : {
    "filebeat_version" : "7.8.0",
    "files" : [
        { "hosts" : ["node01"], "path" : ["/var/log/auth.log"] },
        { "hosts" : ["node02"], "path" : ["/var/log/syslog"] }
    ],
    "kafka" : {
        "cluster_id" : "local",
        "topic_name" : "filebeat-topic"
    }
}
  • filebeat_version

    Mandatory

    version of filebeat

  • files (map[], mandatory)

    This section contains a list of hosts and path to monitor

  • elasticsearch (map)

    This section enables the elasticsearch reporter

  • elasticsearch.cluster_id (string, mandatory)

    Name of the elasticsearch cluster used to store ceph metrics.

  • kafka (map)

    This section enables the kafka reporter

  • kafka.cluster_id (string, mandatory)

    Name of the kafka cluster.

  • kafka.topic_name (string, mandatory)

    Name of the kafka topic to store metrics from filebeat

Metricbeat

It is a small component that send system metrics to:

An Elasticsearch Cluster:

"metricbeat" : {
    "metricbeat_version" : "7.8.0",
    "modules" : {
        "system" : {
            "high_frequency_system_metrics": {
                "metricsets" : ["cpu","load","memory"],
                "reporting_interval" : "30s"
            },
            "normal_frequency_system_metrics": {
                "metricsets" : ["fsstat"],
                "reporting_interval" : "5m"
            },
            "slow_frequency_system_metrics": {
                "metricsets" : ["uptime"],
                "reporting_interval" : "1h"
            }
        }
    },
    "elasticsearch" : {
        "cluster_id" : "es_search"
    }
}

Or a Kafka cluster:

"metricbeat" : {
    "metricbeat_version" : "7.8.0",
    "modules" : {
        "system" : {
            "high_frequency_system_metrics": {
                "metricsets" : ["cpu","load","memory"],
                "reporting_interval" : "30s"
            },
            "normal_frequency_system_metrics": {
                "metricsets" : ["fsstat"],
                "reporting_interval" : "5m"
            },
            "slow_frequency_system_metrics": {
                "metricsets" : ["uptime"],
                "reporting_interval" : "1h"
            }
        }
    },
    "kafka" : {
        "cluster_id" : "local",
        "topic_name" : "platform-system-metrics"
    }
}
  • metricbeat_version

    Mandatory: version of metricbeat

  • reporting_interval (integer)

    Interval in seconds use by metricbeat to report system metrics

  • servers (map)

    To monitor external servers by deploying metricbeat, you can provide a list of additional hosts. At the end, you'll deploy metricbeat on all servers which composed the PunchPlatform + additional servers.

  • modules (string, mandatory)

    Names of metricbeat module

  • modules.[module_name] (string, mandatory)

    Name of a dedicated custom metricbeat metricset

  • modules.[module_name].[metric_name].metricsets (string, mandatory)

    Metricsets of each module To have the full metricset, take a look at the official documentation

  • modules.[module_name].[metric_name].reporting_interval (string, mandatory)

    String containing period between two metricsets collection. For example: 10s, 1m, 1h

  • modules.[module_name].[metric_name].hosts (string)

    Hosts require for some modules such as zookeeper and kafka To have the full metricset, take a look at the official documentation

  • elasticsearch (map)

    This section enabling the elasticsearch reporter

  • elasticsearch.cluster_id (string, mandatory)

    Name of the elasticsearch cluster used to store system metrics

  • kafka (map)

    When present, this section enables the kafka metrics reporter

  • kafka.cluster_id (string, mandatory)

    Name of the kafka cluster

  • kafka.topic_name (string, mandatory)

    Name of the kafka topic to store metrics from metricbeat

Packetbeat

It is a small component that send network packets to

an Elasticsearch cluster:

"packetbeat" : {
    "packetbeat_version" : "7.8.0",
    "reporting_interval" : 30,
    "interfaces" : [
        { "hosts" : ["node01"], "interface" : "eth0" },
        { "hosts" : ["node02"], "interface" : "any" }
    ],
    "elasticsearch" : {
        "cluster_id" : "es_search"
    }
}

Or a Kafka cluster:

"packetbeat" : {
    "packetbeat_version" : "7.8.0",
    "reporting_interval" : 30,
    "interfaces" : [
        { "hosts" : ["node01"], "interface" : "eth0" },
        { "hosts" : ["node02"], "interface" : "any" }
    ],
    "kafka" : {
        "cluster_id" : "local"
    }
}
  • packetbeat_version

    Mandatory: version of packetbeat

  • reporting_interval (integer)

    Interval in seconds use by metricbeat to report system metrics

  • elasticsearch (map)

    This section enables the elasticsearch reporter

  • elasticsearch.cluster_id (string, mandatory)

    Name of the elasticsearch cluster used to store system metrics

  • kafka (map)

    This section enables the kafka reporter

  • kafka.cluster_id (string, mandatory)

    Name of the kafka cluster

Advanced content

Punch Configuration Manager

previously named: git bare

This section is used to locate the git bare of the PunchPlatform configuration.

"git_settings" : {
    "synchro_mode" : "ssh",
    "git_remote_user" : "gituser",
    "git_bare_server" : "node02",
    "git_bare_server_address" : "node02",
    "punchplatform_conf_repo_git_url" : "/mnt/pp00p/pp-conf.git"
}
  • synchro_mode: String

    • use ssh protocol to contact the git bare server
  • git_remote_user: String

    • Use this user to establish the ssh connection. User must already exists.
  • git_bare_server: String

    • name of the server where the git bare is located
  • git_bare_server_address: String

    • name of the interface of the server where the git bare is located if you use a custom ssh port, you can use the port by typing : :
  • punchplatform_conf_repo_git_url

    • path on the git bare server to locate the bare directory
"punchplatform_conf_repo_branch" : "master",
"lmc_conf_dir" : "pp-conf",
"supervisor_waiting_time": 15,
"supervisor_logfile_backups": 10, 
"supervisor_logfile_maxbytes": "50MB"
  • punchplatform_conf_repo_branch

    Optional

    By default, the deployer will assume you are using a branch named in your configuration directory and will clone it when deploying PunchPlatform administration/monitoring service and when deploying initial administration user. If you want the deployer to clone an other branch, you can provide it with this setting.

  • lmc_conf_dir

    Optional: Name of folder which contains a working copy of git repository previously defined used by the punchplatform-admin-server only

  • supervisor_waiting_time

    Optional: Waiting time (seconds) before ansible checks if supervisor is started or not (this checkup is performed each time a component starts)

  • supervisor_logfile_maxbytes

    OPTIONAL

    The maximum number of bytes that may be consumed by log files before it is rotated (suffix multipliers like , can be used in the value.

    Default value: 50MB

  • supervisor_logfile_backups

    OPTIONAL

    The number of log files backups to keep around resulting from process log file rotation.

    Default value: 10