Runtime resolver configuration (resolv.yaml)¶

Overview¶

Before deploying your platform, you must provide an additional resolv.hjson file in the $PUNCHPLATFORM_CONF_DIR of your deployment environment This file helps to reduce the verbosity of channels and punchline configuration files during day to day operation and provides for automatic address resolution and completion of internal component addresses when operator submits application or channels start command.

This file defines translation between logical settings (e.g. an Elasticsearch cluster logical name) and actual settings (that may depend on where from the communication occurs, through which filter, nodes...).

This helps to have configuration files that can be easily moved from one platform to another, with minimal constraints.

A resolv.hjson file must be located, at deployment time in platforms/<platformName> sub-folder of your deployment configuration directory, where platformName is typically 'production'. A symbolic link named resolv.hjson must be set next to the configuration punchplatform-deployment.settings file in $PUNCHPLATFORM_CONF_DIR.

When using operator tools, a local resolver file is pointed by PUNCHPLATFORM_RESOLV_FILE. By default, it points to a deployment-time copy of the resolv.hjson provided during deployment.

Each time this file evolves, it should be deployed to all applicable nodes through running punchplatform-deployer.sh deploy -t platform_configuration. The new file content will be applied at any punch application start or restart.

The same goes for processing occurring on other nodes (shiva, gateway), although in this case, the resolver file will be the one set up inside the shiva/gateway setup dir (often, /data/opt/punch-shiva- respectively /data/opt/punch-gateway- folders).

An example will best illustrate the role of the resolv.hjson file. Consider the following resolv.hjson:

{ 
   // All ES output nodes (Storm & Spark nodes)
   elasticsearch_nodes:{ 
      selection:{ 
         tenant:*
         channel:*
         runtime:*
         name:*
      }
      match:$.dag[?(@.type=='elasticsearch_output' || @.type=='elastic_output')].settings
      additional_values:{ 
         http_hosts:[ 
            { 
               host:node2
               port:9200
            }
         ]
      }
   }
   extraction_node:{ 
      selection:{ 
         tenant:*
         channel:*
         runtime:*
         name:*
      }
      match:$.dag[?(@.type=='extraction_input')].settings
      additional_values:{ 
         nodes:[ 
            node2
         ]
      }
   }
   // All ES spark input nodes 
   elastic_nodes:{ 
      selection:{ 
         tenant:*
         channel:*
         runtime:*
         name:*
      }
      match:$.dag[?(@.type=='elastic_input' || @.type=='elastic_query_stats' || @.type=='python_elastic_input' || @.type=='python_elastic_output')].settings
      additional_values:{ 
         nodes:[ 
            node3
         ]
      }
   }
}

What you express here is that all extraction_input or elasticsearch_output nodes from all tenants and channels and whatever be the runtime type (spark pyspark or storm) and application name of your punchlines should be forwarded to the elasticsearch node node2. In contrast all elastic_input and python_elastic_output nodes should rather send their data to node3.

Tip

this is not only a convenient feature. It is a fundamental feature to ensure end-users do not mess with low level platform configuration issues, and leave it to the integrator/administrator to define upfront the consistent data routing usage of tenant and channel applications. In addition is also allow the administrator to concentrate in one secured file important security configuration settings such as secrets location (certificates directory), not published in public user level configuration.

This file is thus a serie of user defined rules, each rule defined by a unique id and composed with three mandatory sections described hereafter.

Note

For application launched within a channel_structure whose runtime is shiva, it is possible to resolve resources configurations by using key apply_resolver_on (List(str)). Each element of the list should be a file name relative to your channel application directory.

This feature is intended to be used for applications outside the scope of planctl and punchlinectl. For instance, python application like elastalert, elastichousekeeper or put simply custom applications

DEBUG level of shiva daemon will show more info of apply_resolver_on behavior.

Selection¶

The selection section allows you to determine if the rule is applicable for the file before submitting it to shiva, storm or spark. Resolver is only applicable to files inside a channel (except channel_structure.json) with .json, .hjson or .yaml extension

You can define four selection parameters inside selection section.

tenant
channel
runtime of the punchline (spark, storm)
name of the application
file name of the file to be resolved
host hostname of the server. Does not support local names and IPs like 127.0.0.1 or localhost

This way you can select a specific rule for a use case.

For example if you want to define a rule for all applications in a specific channel:

selection:{ 
         tenant: *
         channel: apache_httpd
         runtime: *
         name: *
         file: *
         host: *
}

Or for all storm-like stream applications:

selection:{ 
         tenant: *
         channel: *
         runtime: storm
         name: *
         file: *
         host: *
}

Wildcard ('') can be used inside the selection filter value to represent any character sequence (e.g. name:ltr_ for applications which names start with 'ltr_') . A selection parameter that is not present is similar to using the '*' value.

Match¶

Once you defined your rule scope with selection settings, you have to define which files will be concerned by your rule

Concretely, if your selection section matched all files from a specific channel and your channel is composed with different applications, you may want to define your elasticsearch host only for one application.

To do that, we use a JsonPath expression, it acts as a select in json file

For example for a json file input :

{
  "metrics": {
    "reporters": [
      {
        "type": "kafka"
      },
      {
        "type": "elasticsearch"
      }
    ]
  }
}

If you want to select only elasticsearch reporter section, you have to set this json path expression :

$.metrics.reporters[?(@.type=='elasticsearch')]

It will retrieve :

[
   {
      "type" : "elasticsearch"
   }
]

All files which matched selection section AND this section will be enrich, other ones will be submitted without modifications

We use Jayway JsonPath lirairies internaly, if you want to test your match section before submitting it, you can use this online evaluator

Additional values¶

Finally, once your file match both previous section, you can define what you want to add to your file by defining the third section :

additional_values:{ 
         nodes:[ 
            server2
         ]
      }

It will append these lines where the json path expression matched

To summarise how the resolver works in a few words : the first two sections of each rules permits to select which file and which section of this file you want to enrich. If the parameters match, the additional lines will be added to the file before submission. Multiple rules can match.

Debuggin the resolver¶

A punch cli is provided for debugging Punch Resolver outputs. The computed output can be display with the command below:

punchlinectl -t mytenant resolve -p punchline.json

As you can see, the resolver output the resolved punchline on STDOUT. This command is intended for debug only.

Instead of '-p', if you resolve some other configuration file (channel_structure.hjson, channels monitoring configuration...), use '-f'.

Remember that resolution with this tool takes into account :

the tenant (-t or PUNCHPLATFORM_TENANT environment variable, or tenant folder name),
the channel (-c or PUNCHPLATFORM_CHANNEL environment variable, or channel folder name)
the application name (-n or PUNCHPLATFORM_APP_NAME environment variable, or punchline file basename without extension)
the runtime (defined inside punchline files)