Gateway¶

Abstract

The Punchplatform Gateway is a Restful service placed in front of the Punchplatform services such as Elasticsearch, Punchctl client for channels or PML jobs. It provides a transparency access to the Punchplatform features by an endpoint design for any external client.

Start¶

On a fresh standalone, run :

# run in background
punchplatform-gateway.sh --start
# run in foreground
punchplatform-gateway.sh --start-foreground

Logs¶

Check the Gateway's status with :

punchplatform-gateway.sh --status

In standalone, check the application logs on your file system :

tail -f $PUNCHPLATFORM_CONF_DIR/../external/punchplatform-gateway-6.0.0/logs/punchplatform-gateway.log

Or check the Rest API logs in Kibana in index platform-logs.

Feature redirection¶

All the endpoints for routing are described in REST API doc

Elasticsearch¶

Every Elasticsearch cluster is accessible through /es/{cluster_id}.

The redirection respects a request transparency to the Elasticsearch clusters.
Each path matching the pattern /es/{cluster_id}/** will be directly rerouted to the cluster.

Example :

curl GET localhost:4242/v1/mytenant/es/es_search/_cat/indices
curl -XPUT localhost:4242/v1/mytenant/es/es_search/newindex

Channels¶

Channel management is accessible through /channels.

GET method is used to request channels status, while each POST method is used to execute start and stop actions.

Example :

curl -v GET localhost:4242/v1/mytenant/channels
curl -v -XPOST localhost:4242/v1/mytenant/channels/admin/start
curl -v GET localhost:4242/v1/mytenant/channels/admin
curl -v -XPOST localhost:4242/v1/mytenant/channels/admin/stop

Punchline¶

Punchline application features are accessible through /punchline.

It allows a client to save a punchline to Elasticsearch, query the saved ones, execute them and request the execution results.

Examples :

# save punchline
curl -XPOST localhost:4242/v1/mytenant/punchline/save \
  -F file=/@tmp/dataset_generator.hjson
# scan saved punchlines
curl GET localhost:4242/v1/mytenant/punchline/scan
# execute saved punchline
curl -XPOST localhost:4242/v1/mytenant/punchline/{punchline_id}
# directly execute punchline
curl -XPOST localhost:4242/v1/mytenant/punchline \
  -F file=@/tmp/dataset_generator.hjson
# get punchline execution
curl GET localhost:4242/v1/mytenant/punchline/{punchline_id}/executions/{execution_id}
# delete punchline
curl -XDELETE localhost:4242/v1/mytenant/punchline/{punchline_id}

Puncher¶

The Puncher tool for punchlets processing is accessible through /puncher.

It allows a client to directly execute a grok or a dissect operator on inputs, or execute a complete punchlet over a log file.

Examples :

# grok operator on input
curl -XPOST localhost:4242/v1/puncher/grok \
  -F input=@/tmp/inputfile \
  -F pattern=@/tmp/patternfile
# dissect operator on input
curl -XPOST localhost:4242/v1/puncher/dissect \
  -F input=@/tmp/inputfile \
  -F pattern=@/tmp/patternfile
# execute punchlet
curl -XPOST localhost:4242/v1/puncher/dissect \
  -F input=@/tmp/inputfile \
  -F logFile=@/tmp/logfile

Resource Manager¶

The resource manager to store data is accessible through /resources.

It allows a client to :

Upload a resource
Download a resource
Copy a resource
Move a resource
Delete a resource
List resources inside a tenant
Register an external resource

Info

Check Resource Manager Reference Guide to learn more.

Upload¶

pattern : curl -vX PUT http://localhost:4242/v1/mytenant/resources/upload/<resource_name>

You must provide a form-data body with properties :

input : mandatory, input path of the file to store
properties: optional, list of custom properties with format ["key=value"]
version: optional, specific version to store
embedded: optional, set to true if you want to store the data in metadata

curl --location --request PUT 'http://localhost:4242/v1/mytenant/resources/upload/tests/test.txt' \
--form 'input=@/home/lca/Pictures/wp3136254.png' \
--form 'properties={"description":"hello world, and aliens","test":true}' \
--form 'version=1' \
--form 'embedded=true'

Download¶

pattern : curl -vX GET http://localhost:4242/v1/mytenant/resources/download/<resource_name>?<parameters>

You can provide parameters like :

version: specific version to download
output: output path to store the downloaded resource on local filesystem

curl --location --request GET 'http://localhost:4242/v1/mytenant/resources/download/tests/test.txt?output=/tmp/output.txt&version=42'

Copy¶

pattern : curl -vX PUT http://localhost:4242/v1/mytenant/resources/copy/<resource_name>

You must provide a form-data body with properties :

destination : mandatory, future path of the file to copy
version: optional, specific version to copy
embedded: optional, set to true if you want to copy the data in metadata

curl --location --request PUT 'http://localhost:4242/v1/mytenant/resources/copy/tests/test.txt' \
--form 'destination=copies/test.txt' \
--form 'version=42' \
--form 'embedded=true'

Move¶

pattern : curl -vX PUT http://localhost:4242/v1/mytenant/resources/move/<resource_name>

You must provide a form-data body with properties :

destination : mandatory, future path of the file to move
version: optional, specific version to move
embedded: optional, set to true if you want to move the data in metadata

curl --location --request PUT 'http://localhost:4242/v1/mytenant/resources/move/tests/test.txt' \
--form 'destination=moves/test.txt' \
--form 'version=42' \
--form 'embedded=true'

Delete¶

pattern : curl -vX DELETE http://localhost:4242/v1/mytenant/resources/delete/<resource_name>?<parameters>

You can provide parameters like :

version: specific version to delete

curl --location --request DELETE 'http://localhost:4242/v1/mytenant/resources/delete/images?version=42'

List¶

pattern : curl -vX GET http://localhost:4242/v1/mytenant/resources/list?<parameters>

You can provide parameters like :

pattern: wildcard pattern name to filter metadata resources according to this pattern
all: set it to true if you want to list all the versions matching the pattern. If false, only the last version of each
resource will be provided
filter: filter the results matching the provided filter with format key=value. This parameter can be repeated
output: output path to store the list on local filesystem
simplify: set it to true if you want to simplify the provided list with only name, version and timestamp

curl --location --request GET 'http://localhost:4242/v1/mytenant/resources/list?all=true&pattern=mytests/*&filter=owner=bob&simplify=true&output=/tmp/simple.txt'

Update properties¶

pattern : curl -vX POST http://localhost:4242/v1/mytenant/resources/update/<resource_name>

You must provide a data raw body in json format with properties :

version: optional, specific version to register
properties: optional, list of custom properties with format ["key=value"]

curl --location --request POST 'http://localhost:4242/v1/mytenant/resources/update/tests/test.txt' \
--form 'version=12' \
--form 'properties={"description":"hello world, and aliens","type":"json"}'

Register¶

pattern : curl -vX PUT http://localhost:4242/v1/mytenant/resources/register/<resource_name>

You must provide a data raw body in json format with properties :

url : mandatory, url of the file to register. This url must be complete and allow a user to query it to get the concerned data
version: optional, specific version to register
properties: optional, list of custom properties with format ["key=value"]
embedded: optional, set to true if you want to store the data in metadata

curl --location --request PUT 'http://localhost:4242/v1/mytenant/resources/register/tests/test.txt' \
--form 'version=42' \
--form 'url=/tmp/test/test.txt' \
--form 'properties={"description":"hello world, and aliens","test":true}'

Manual configuration¶

Configuration file :

$PUNCHPLATFORM_GATEWAY_INSTALL_DIR/conf/punchplatform-gateway.yml

Basic example :

spring:
  servlet:
    multipart:
      max-file-size: -1
      max-request-size: -1

# Internal server configuration for Gateway
server:
  address: localhost
  port: 4242

# One Gateway is related to a tenant
# The requested tenant is specified inside each request's path and a wrong tenant lead to a 404 error
punchplatform:
  tenant: "mytenant"

# The Gateway has its own reporters sending the gateway metrics inside the ES metric cluster
reporters:
  elasticsearch:
    - hosts:
        - "localhost:9200"
      index_name: "mytenant-gateway-logs"

# This configuration is used for ES forwarding feature
# It MUST contain 2 sections, one to store data and one to store metrics 
# There is no need to configure a 'credentials' section. If either the data cluster or the metric cluster is secured
# with authentication, each forwarded request MUST contain an Authorization header 
elasticsearch:
  data_cluster:
    cluster_id: "es_data"
    hosts:
      - "server1:9200"
    settings:
      - "es.index.read.missing.as.empty: yes"
      - "es.nodes.discovery: true"
  metric_cluster:
    cluster_id: "es_metrics"
    hosts:
      - "server2:9200"
    settings:
      - "es.index.read.missing.as.empty: yes"
      - "es.nodes.discovery: true"
    index_name: "mytenant-metrics"

# Related to channel management
# Disabling this service will lead to a 404 error if requested
channels:
  enabled: true

# Related to Puncher tool
# Disabling this service will lead to a 404 error if requested
puncher:
  enabled: true

# Related to Punchlines executions and management 
# Disabling this service will lead to a 404 error if requested
punchline:
  enabled: true

# Related to forwarding service to ES, with a filtering action according to the configured punchline in this section
# Disabling this service will lead to no filtering applied on forwarded requests to ES
forwarding:
  enabled: true
  punchlet: "file:///home/lca/Applications/punch-standalone-6.1.0-linux/external/punch-gateway-6.1.0/conf/forwarding.punch"
  reload : "0 * * * * *"

# Related to the extraction service
# Disabling this service will lead to a 404 error if requested
services:
  extraction:
    enabled: true
    formats:
      - "csv"
      - "json"

# Related to resources management files and services like documentation and archive files
# Disabling the resource manager service will lead to a 404 error if requested
resources:
  doc_dir: "<path_to_documentation_html_page>"
  tmp_dir: "/tmp"
  archives_dir: "/tmp/extractions"
  manager:
    enabled: true
    timeout: 15000
    metadata:
      elasticsearch:
        - hosts:
            - "server2:9200"
          index: "resources-metadata"
    data:
      file:
        - root_path: "/tmp/punchplatform/manager/resources"

management:
  endpoint:
    httptrace:
      enabled: false
    mappings:
      enabled: false
  endpoints:
    enabled-by-default: false

Security¶

Authentication forwarding¶

Gateway will forward any authorization header to Elasticsearch cluster.

The concerned endpoints are :

Elasticsearch

All token types supported by Elasticsearch Rest API are also supported by the Punchplatform Gateway.

Abstract

How to get the token?
In the case of standalone with Opendistro, the token is a base64 encoding of the "login:password" chain.
You can generate a token using, for example, the website base64encode.org.
The token for the standalone corresponding to the credentials admin:admin is YWRtaW46YWRtaW4=.

Example :

curl -v GET localhost:4242/v1/mytenant/es/_cat/indices -H "Authorization: Basic YWRtaW46YWRtaW4=" 

yellow open platform-logs-2020.01.28             JVGEA2xsRUWDhNCVn18vdg 5 1   10 0  55.2kb  55.2kb
yellow open .kibana_92668751_admin               MdP8UNobT8SmW3U276K6iQ 1 1    1 0   3.7kb   3.7kb
green  open .kibana_1                            Zq5w1fBtSPeIQvZ45vhdyQ 1 0    0 0    261b    261b
yellow open security-auditlog-2020.01.28         W12oeFkYT7qXc3B_pcREog 5 1   11 0 174.8kb 174.8kb
yellow open platform-metricbeat-6.8.6-2020.01.28 UnTZL_U5QZqMU8bZtao94g 1 1 1479 0 962.9kb 962.9kb
green  open .opendistro_security                 Su-xHUevSL2IarcTfhu-lA 1 0    5 0  25.6kb  25.6kb

Authentication for other services¶

The services concerned by an Elasticsearch connexion should be configured with credentials information if needed.

The potential services concerned by an Elasticsearch authentication configuration are :

ES reporters
Resource Manager

The authentication configurations for these services are the same. Example :

reporters:
  elasticsearch:
    - hosts:
        - "server2:9200"
      index_name: "mytenant-gateway-logs"
      credentials:
        user: "admin"
        password: "admin"

Warning

For a prduction context, be sure this file is properly protected by appropriate Unix account and permissions

SSL¶

There are two ways to activate SSL for the Punchplatform Gateway :

Client to Gateway
Gateway to endpoints

These features are both independent and disabled in standalone by default, but you can trigger them inside the Gateway configuration file.

SSL for clients to Gateway¶

A keystore is provided by the standalone in $PUNCHPLATFORM_CONF_DIR/../external/punchplatform-gateway-6.0.0/res/ssl/gateway.keystore

To activate SSL from any clients to Gateway's Rest API, set server.ssl.enabled to true :

vi $PUNCHPLATFORM_CONF_DIR/../external/punchplatform-gateway-6.0.0/conf/application-gateway.yml

# conf for standalone
server:
  address: 127.0.0.1
  port: 4242
  ssl:
    enabled: true
    key-alias: "gateway"
    key-store: "/path/to/gateway.keystore"
    key-store-type: "jks"
    key-store-password: "gateway"
    key-password: "gateway"

You can also create your own keystore with :

keytool -genkey -alias myalias -keyalg RSA -keystore gateway.keystore \
          -validity 3650 -storetype JKS \
          -dname "CN=localhost, OU=Spring, O=Pivotal, L=Kailua-Kona, ST=HI, C=US"
          -keypass changeit -storepass changeit
          -deststoretype pkcs12

Then change the configuration according to your new keystore.

SSL for Gateway to endpoints¶

Each cluster can be referenced as a protected endpoint with ssl_enbled: true.

Additional SSL configurations are available, and depends on your security architecture :

ssl_private_key: Optional, Gateway's private key to connect to the ES cluster.
ssl_certificate: Optional, Gateway's private key to connect to the ES cluster
ssl_trusted_certificate: Optional, Gateway's CA file to connect to the ES cluster. This option will enable the server-side authentication using the ES certificates.

elasticsearch:     
  enabled: true   
  data_cluster:                                                    
    cluster_id: "es_search"   
    hosts:                                                         
      - "localhost:9200"  
    settings:
      - "es.index.read.missing.as.empty: yes"
      - "es.nodes.discovery: true"                                                                                                    
    ssl_enabled: true
    ssl_private_key: "/data/certs/key.pem"
    ssl_certificate: "/data/certs/cert.pem"
    ssl_trusted_certificate: "/data/certs/cafile.pem"

Updating static punchlines nodes resources¶

In case a patch is made on either spark, pyspark or storm runtime, impacting nodes configurations, you might want to update punchlines static resources found in: $PUNCHPLATFORM_GATEWAY_INSTALL_DIR/punchlines/

spark_nodes.json
storm_nodes.json
pyspark_nodes.json

punchplatform-inspect-node.sh -h

Spark with mllib

# generates a list of json documents
punchplatform-inspect-node.sh --packages org.thales --runtime spark --mllib --base-class org.apache.spark.ml.PipelineStage > mllib_nodes
# generates a list of json documents
punchplatform-inspect-node.sh --packages org.thales --runtime spark --jar $PUNCHPLATFORM_INSTALL_DIR/lib/spark/punch-spark-uber-*.jar > spark_nodes

# manually concatenant both list from mllib_nodes and spark_nodes into a single one and replace content of
# $PUNCHPLATFORM_GATEWAY_INSTALL_DIR/punchlines/spark_nodes.json with the new concatenanted one.

Storm

# generates a list of json documents
punchplatform-inspect-node.sh --packages org.thales --runtime storm --jar $PUNCHPLATFORM_INSTALL_DIR/lib/storm/punch-topology-app-*-jar-with-dependencies.jar > $PUNCHPLATFORM_GATEWAY_INSTALL_DIR/punchlines/storm_nodes.json

Pyspark

# generates a list of json documents
punchplatform-inspect-node.sh --packages punchline_python --runtime pyspark > $PUNCHPLATFORM_GATEWAY_INSTALL_DIR/punchlines/pyspark_nodes.json

API Documentation¶

You can check the API documentation for more information.

The associated javadoc is a part of the user documentation, though some of it targets only the developers community :

punchplatform-gateway-api