Gateway¶
Abstract
The Punchplatform Gateway is a Restful service placed in front of the Punchplatform services such as Elasticsearch, Punchctl client for channels or PML jobs. It provides a transparency access to the Punchplatform features by an endpoint design for any external client.
Start¶
On a fresh standalone, run :
# run in background
punchplatform-gateway.sh --start
# run in foreground
punchplatform-gateway.sh --start-foreground
Logs¶
Check the Gateway's status with :
punchplatform-gateway.sh --status
In standalone, check the application logs on your file system :
tail -f $PUNCHPLATFORM_CONF_DIR/../external/punchplatform-gateway-6.0.0/logs/punchplatform-gateway.log
Or check the Rest API logs in Kibana in index platform-logs
.
Feature redirection¶
All the endpoints for routing are described in REST API doc
Elasticsearch¶
Every Elasticsearch cluster is accessible through /es/{cluster_id}
.
The redirection respects a request transparency to the Elasticsearch clusters.
Each path matching the pattern /es/{cluster_id}/**
will be directly rerouted to the cluster.
Example :
curl GET localhost:4242/v1/mytenant/es/es_search/_cat/indices
curl -XPUT localhost:4242/v1/mytenant/es/es_search/newindex
Channels¶
Channel management is accessible through /channels
.
GET
method is used to request channels status, while each POST
method is used to execute start and stop actions.
Example :
curl -v GET localhost:4242/v1/mytenant/channels
curl -v -XPOST localhost:4242/v1/mytenant/channels/admin/start
curl -v GET localhost:4242/v1/mytenant/channels/admin
curl -v -XPOST localhost:4242/v1/mytenant/channels/admin/stop
Punchline¶
Punchline application features are accessible through /punchline
.
It allows a client to save a punchline to Elasticsearch, query the saved ones, execute them and request the execution results.
Examples :
# save punchline
curl -XPOST localhost:4242/v1/mytenant/punchline/save \
-F file=/@tmp/dataset_generator.hjson
# scan saved punchlines
curl GET localhost:4242/v1/mytenant/punchline/scan
# execute saved punchline
curl -XPOST localhost:4242/v1/mytenant/punchline/{punchline_id}
# directly execute punchline
curl -XPOST localhost:4242/v1/mytenant/punchline \
-F file=@/tmp/dataset_generator.hjson
# get punchline execution
curl GET localhost:4242/v1/mytenant/punchline/{punchline_id}/executions/{execution_id}
# delete punchline
curl -XDELETE localhost:4242/v1/mytenant/punchline/{punchline_id}
Puncher¶
The Puncher tool for punchlets processing is accessible through /puncher
.
It allows a client to directly execute a grok
or a dissect
operator on inputs, or execute a complete punchlet over a
log file.
Examples :
# grok operator on input
curl -XPOST localhost:4242/v1/puncher/grok \
-F input=@/tmp/inputfile \
-F pattern=@/tmp/patternfile
# dissect operator on input
curl -XPOST localhost:4242/v1/puncher/dissect \
-F input=@/tmp/inputfile \
-F pattern=@/tmp/patternfile
# execute punchlet
curl -XPOST localhost:4242/v1/puncher/dissect \
-F input=@/tmp/inputfile \
-F logFile=@/tmp/logfile
Resource Manager¶
The resource manager to store data is accessible through /resources
.
It allows a client to :
- Upload a resource
- Download a resource
- Copy a resource
- Move a resource
- Delete a resource
- List resources inside a tenant
- Register an external resource
Info
Check Resource Manager Reference Guide to learn more.
Upload¶
pattern : curl -vX PUT http://localhost:4242/v1/mytenant/resources/upload/<resource_name>
You must provide a form-data body with properties :
- input : mandatory, input path of the file to store
- properties: optional, list of custom properties with format
["key=value"]
- version: optional, specific version to store
- embedded: optional, set to true if you want to store the data in metadata
curl --location --request PUT 'http://localhost:4242/v1/mytenant/resources/upload/tests/test.txt' \
--form 'input=@/home/lca/Pictures/wp3136254.png' \
--form 'properties={"description":"hello world, and aliens","test":true}' \
--form 'version=1' \
--form 'embedded=true'
Download¶
pattern : curl -vX GET http://localhost:4242/v1/mytenant/resources/download/<resource_name>?<parameters>
You can provide parameters like :
- version: specific version to download
- output: output path to store the downloaded resource on local filesystem
curl --location --request GET 'http://localhost:4242/v1/mytenant/resources/download/tests/test.txt?output=/tmp/output.txt&version=42'
Copy¶
pattern : curl -vX PUT http://localhost:4242/v1/mytenant/resources/copy/<resource_name>
You must provide a form-data body with properties :
- destination : mandatory, future path of the file to copy
- version: optional, specific version to copy
- embedded: optional, set to true if you want to copy the data in metadata
curl --location --request PUT 'http://localhost:4242/v1/mytenant/resources/copy/tests/test.txt' \
--form 'destination=copies/test.txt' \
--form 'version=42' \
--form 'embedded=true'
Move¶
pattern : curl -vX PUT http://localhost:4242/v1/mytenant/resources/move/<resource_name>
You must provide a form-data body with properties :
- destination : mandatory, future path of the file to move
- version: optional, specific version to move
- embedded: optional, set to true if you want to move the data in metadata
curl --location --request PUT 'http://localhost:4242/v1/mytenant/resources/move/tests/test.txt' \
--form 'destination=moves/test.txt' \
--form 'version=42' \
--form 'embedded=true'
Delete¶
pattern : curl -vX DELETE http://localhost:4242/v1/mytenant/resources/delete/<resource_name>?<parameters>
You can provide parameters like :
- version: specific version to delete
curl --location --request DELETE 'http://localhost:4242/v1/mytenant/resources/delete/images?version=42'
List¶
pattern : curl -vX GET http://localhost:4242/v1/mytenant/resources/list?<parameters>
You can provide parameters like :
- pattern: wildcard pattern name to filter metadata resources according to this pattern
- all: set it to true if you want to list all the versions matching the pattern. If false, only the last version of each
resource will be provided - filter: filter the results matching the provided filter with format
key=value
. This parameter can be repeated - output: output path to store the list on local filesystem
- simplify: set it to true if you want to simplify the provided list with only name, version and timestamp
curl --location --request GET 'http://localhost:4242/v1/mytenant/resources/list?all=true&pattern=mytests/*&filter=owner=bob&simplify=true&output=/tmp/simple.txt'
Update properties¶
pattern : curl -vX POST http://localhost:4242/v1/mytenant/resources/update/<resource_name>
You must provide a data raw body in json format with properties :
- version: optional, specific version to register
- properties: optional, list of custom properties with format
["key=value"]
curl --location --request POST 'http://localhost:4242/v1/mytenant/resources/update/tests/test.txt' \
--form 'version=12' \
--form 'properties={"description":"hello world, and aliens","type":"json"}'
Register¶
pattern : curl -vX PUT http://localhost:4242/v1/mytenant/resources/register/<resource_name>
You must provide a data raw body in json format with properties :
- url : mandatory, url of the file to register. This url must be complete and allow a user to query it to get the concerned data
- version: optional, specific version to register
- properties: optional, list of custom properties with format
["key=value"]
- embedded: optional, set to true if you want to store the data in metadata
curl --location --request PUT 'http://localhost:4242/v1/mytenant/resources/register/tests/test.txt' \
--form 'version=42' \
--form 'url=/tmp/test/test.txt' \
--form 'properties={"description":"hello world, and aliens","test":true}'
Manual configuration¶
Configuration file :
- $PUNCHPLATFORM_GATEWAY_INSTALL_DIR/conf/punchplatform-gateway.yml
Basic example :
spring:
servlet:
multipart:
max-file-size: -1
max-request-size: -1
# Internal server configuration for Gateway
server:
address: localhost
port: 4242
# One Gateway is related to a tenant
# The requested tenant is specified inside each request's path and a wrong tenant lead to a 404 error
punchplatform:
tenant: "mytenant"
# The Gateway has its own reporters sending the gateway metrics inside the ES metric cluster
reporters:
elasticsearch:
- hosts:
- "localhost:9200"
index_name: "mytenant-gateway-logs"
# This configuration is used for ES forwarding feature
# It MUST contain 2 sections, one to store data and one to store metrics
# There is no need to configure a 'credentials' section. If either the data cluster or the metric cluster is secured
# with authentication, each forwarded request MUST contain an Authorization header
elasticsearch:
data_cluster:
cluster_id: "es_data"
hosts:
- "server1:9200"
settings:
- "es.index.read.missing.as.empty: yes"
- "es.nodes.discovery: true"
metric_cluster:
cluster_id: "es_metrics"
hosts:
- "server2:9200"
settings:
- "es.index.read.missing.as.empty: yes"
- "es.nodes.discovery: true"
index_name: "mytenant-metrics"
# Related to channel management
# Disabling this service will lead to a 404 error if requested
channels:
enabled: true
# Related to Puncher tool
# Disabling this service will lead to a 404 error if requested
puncher:
enabled: true
# Related to Punchlines executions and management
# Disabling this service will lead to a 404 error if requested
punchline:
enabled: true
# Related to forwarding service to ES, with a filtering action according to the configured punchline in this section
# Disabling this service will lead to no filtering applied on forwarded requests to ES
forwarding:
enabled: true
punchlet: "file:///home/lca/Applications/punch-standalone-6.1.0-linux/external/punch-gateway-6.1.0/conf/forwarding.punch"
reload : "0 * * * * *"
# Related to the extraction service
# Disabling this service will lead to a 404 error if requested
services:
extraction:
enabled: true
formats:
- "csv"
- "json"
# Related to resources management files and services like documentation and archive files
# Disabling the resource manager service will lead to a 404 error if requested
resources:
doc_dir: "<path_to_documentation_html_page>"
tmp_dir: "/tmp"
archives_dir: "/tmp/extractions"
manager:
enabled: true
timeout: 15000
metadata:
elasticsearch:
- hosts:
- "server2:9200"
index: "resources-metadata"
data:
file:
- root_path: "/tmp/punchplatform/manager/resources"
management:
endpoint:
httptrace:
enabled: false
mappings:
enabled: false
endpoints:
enabled-by-default: false
Security¶
Authentication forwarding¶
Gateway will forward any authorization header to Elasticsearch cluster.
The concerned endpoints are :
- Elasticsearch
All token types supported by Elasticsearch Rest API are also supported by the Punchplatform Gateway.
Abstract
How to get the token?
In the case of standalone with Opendistro, the token is a base64 encoding of the "login:password" chain.
You can generate a token using, for example, the website base64encode.org.
The token for the standalone corresponding to the credentials admin:admin
is YWRtaW46YWRtaW4=
.
Example :
curl -v GET localhost:4242/v1/mytenant/es/_cat/indices -H "Authorization: Basic YWRtaW46YWRtaW4="
yellow open platform-logs-2020.01.28 JVGEA2xsRUWDhNCVn18vdg 5 1 10 0 55.2kb 55.2kb
yellow open .kibana_92668751_admin MdP8UNobT8SmW3U276K6iQ 1 1 1 0 3.7kb 3.7kb
green open .kibana_1 Zq5w1fBtSPeIQvZ45vhdyQ 1 0 0 0 261b 261b
yellow open security-auditlog-2020.01.28 W12oeFkYT7qXc3B_pcREog 5 1 11 0 174.8kb 174.8kb
yellow open platform-metricbeat-6.8.6-2020.01.28 UnTZL_U5QZqMU8bZtao94g 1 1 1479 0 962.9kb 962.9kb
green open .opendistro_security Su-xHUevSL2IarcTfhu-lA 1 0 5 0 25.6kb 25.6kb
Authentication for other services¶
The services concerned by an Elasticsearch connexion should be configured with credentials
information if needed.
The potential services concerned by an Elasticsearch authentication configuration are :
- ES reporters
- Resource Manager
The authentication configurations for these services are the same. Example :
reporters:
elasticsearch:
- hosts:
- "server2:9200"
index_name: "mytenant-gateway-logs"
credentials:
user: "admin"
password: "admin"
Warning
For a prduction context, be sure this file is properly protected by appropriate Unix account and permissions
SSL¶
There are two ways to activate SSL for the Punchplatform Gateway :
- Client to Gateway
- Gateway to endpoints
These features are both independent and disabled in standalone by default, but you can trigger them inside the Gateway configuration file.
SSL for clients to Gateway¶
A keystore is provided by the standalone in
$PUNCHPLATFORM_CONF_DIR/../external/punchplatform-gateway-6.0.0/res/ssl/gateway.keystore
To activate SSL from any clients to Gateway's Rest API, set server.ssl.enabled
to true
:
vi $PUNCHPLATFORM_CONF_DIR/../external/punchplatform-gateway-6.0.0/conf/application-gateway.yml
# conf for standalone
server:
address: 127.0.0.1
port: 4242
ssl:
enabled: true
key-alias: "gateway"
key-store: "/path/to/gateway.keystore"
key-store-type: "jks"
key-store-password: "gateway"
key-password: "gateway"
You can also create your own keystore with :
keytool -genkey -alias myalias -keyalg RSA -keystore gateway.keystore \
-validity 3650 -storetype JKS \
-dname "CN=localhost, OU=Spring, O=Pivotal, L=Kailua-Kona, ST=HI, C=US"
-keypass changeit -storepass changeit
-deststoretype pkcs12
Then change the configuration according to your new keystore.
SSL for Gateway to endpoints¶
Each cluster can be referenced as a protected endpoint with ssl_enbled: true
.
Additional SSL configurations are available, and depends on your security architecture :
ssl_private_key
: Optional, Gateway's private key to connect to the ES cluster.ssl_certificate
: Optional, Gateway's private key to connect to the ES clusterssl_trusted_certificate
: Optional, Gateway's CA file to connect to the ES cluster. This option will enable the server-side authentication using the ES certificates.
elasticsearch:
enabled: true
data_cluster:
cluster_id: "es_search"
hosts:
- "localhost:9200"
settings:
- "es.index.read.missing.as.empty: yes"
- "es.nodes.discovery: true"
ssl_enabled: true
ssl_private_key: "/data/certs/key.pem"
ssl_certificate: "/data/certs/cert.pem"
ssl_trusted_certificate: "/data/certs/cafile.pem"
Updating static punchlines nodes resources¶
In case a patch is made on either spark
, pyspark
or storm
runtime, impacting nodes configurations, you might want to update punchlines static resources found in: $PUNCHPLATFORM_GATEWAY_INSTALL_DIR/punchlines/
- spark_nodes.json
- storm_nodes.json
- pyspark_nodes.json
punchplatform-inspect-node.sh -h
Spark with mllib
# generates a list of json documents
punchplatform-inspect-node.sh --packages org.thales --runtime spark --mllib --base-class org.apache.spark.ml.PipelineStage > mllib_nodes
# generates a list of json documents
punchplatform-inspect-node.sh --packages org.thales --runtime spark --jar $PUNCHPLATFORM_INSTALL_DIR/lib/spark/punch-spark-uber-*.jar > spark_nodes
# manually concatenant both list from mllib_nodes and spark_nodes into a single one and replace content of
# $PUNCHPLATFORM_GATEWAY_INSTALL_DIR/punchlines/spark_nodes.json with the new concatenanted one.
Storm
# generates a list of json documents
punchplatform-inspect-node.sh --packages org.thales --runtime storm --jar $PUNCHPLATFORM_INSTALL_DIR/lib/storm/punch-topology-app-*-jar-with-dependencies.jar > $PUNCHPLATFORM_GATEWAY_INSTALL_DIR/punchlines/storm_nodes.json
Pyspark
# generates a list of json documents
punchplatform-inspect-node.sh --packages punchline_python --runtime pyspark > $PUNCHPLATFORM_GATEWAY_INSTALL_DIR/punchlines/pyspark_nodes.json
API Documentation¶
You can check the API documentation for more information.
The associated javadoc is a part of the user documentation, though some of it targets only the developers community :