Skip to content

Data extraction plugin

The Data Extraction plugin makes it possible to extract data from Elasticsearch. It requires the Punch Gateway to launch an extraction job in the background.

Installation

The plugin dataExtraction-X.Y.Z.zip is available in the Standalone and in the Deployer.

bash $KIBANA_INSTALL_DIR/bin/kibana-plugin install file:///<path-to-dataExtraction-X.Y.Z.zip>

Configuration

The plugin has the following parameters in $KIBANA_INSTALL_DIR/config/kibana.yml

# Mandatory
data_extraction.api.hosts: ["http://localhost:4242"]
data_extraction.tenant: mytenant

# Optional with default values
data_extraction.enabled: true
data_extraction.useLegacy: false
data_extraction.api.ssl.enabled: false
data_extraction.api.ssl.certificateAuthorities: [""]
data_extraction.api.ssl.certificate: ""
data_extraction.api.ssl.key: ""

Usage

Display extractions

Click on the Data extraction tile in Kibana. All extractions created are displayed on this screen.

Kibana Punchplatform Plugin Data Extraction Scheduler

You can see extraction information like description, current status, duration and output location.

  • Status: Available status are Scheduled, Submitted, Running, Success, Failed.
  • Output: You can download file directly. It will be an archive containing multiple files (in CSV or NDJSON).
  • Logs: Click on the arrow at the right to toggle the display of its logs.

You can manually refresh the extractions by refreshing the whole page.

Create an extraction

Before you create an extraction

Make sure :

  • This Punch Plugin instance is connected to a Punch Gateway.
  • This Punch Gateway is connected to an ES data cluster.
  • The data you try to extract are located inside this ES data cluster.
  • You have at least a Saved search or an Index Pattern in Kibana.
  • Punch template pp_mapping_metadata.json has been pushed to ES clusters.

To create a new extraction, click on Create extraction at the top of the page.

Fill in the fields

Kibana Punchplatform Plugin Data Extractions Editor

  • Kibana saved search: Select from which save search (made from Discover) to extract the data
  • Extraction range: Select date range
  • Max size: Set maximum output rows
  • Extract _id: Check to add a column id which contains Elasticsearch document _id
  • Extract all fields: Check to add a column source which contains Elasticsearch document _source
  • Fields to extract: Select fields you want in your subset. Click on the field to put it into the other column. The available fields are on the left, the selected fields are on the right.
    TIPS: Use arrows between columns to move all from left to right and vice-versa.
  • Filters: Add filters on your extract data
  • Description: Name your extraction
  • Output format: Choose where to extract, in another Elasticsearch index, or in file (CSV/JSON)
  • Separator: For CSV format. Choose separator to delimit your columns in output.
  • Headers: For CSV format. Include column headers in output.
  • Tenant: Related to the Gateway tenant.
Change Runtime

Plugin Extraction is based on a Punch Topology. This topology can be run in Storm runtime (default) or in Spark runtime.

Plugin Extraction in Storm runtime is based on a Storm-like topology with : - Extraction Input - Punchlet Node to flatten tuples - File Output

Plugin Extraction in Spark runtime is based on a Spark-like topology with : - Elastic Input - File Output

To configure Plugin Extraction in Spark runtime, configure kibana.yml with :

data_extraction.useLegacy: true

Warning

Please note that Spark extraction requires data to fit in memory. If you want to extract massive amount of data, use Storm extraction instead.

Update extraction template

The topology to run extraction is based on a topology template. This template is filled with the information provided in the Data Extraction plugin. You can modify this template to fine tune your extraction.

Storm template:
$PUNCHPLATFORM_GATEWAY_INSTALL_DIR/templates/extraction/storm/extraction_es_to_file.yaml.j2

Spark template:
$PUNCHPLATFORM_GATEWAY_INSTALL_DIR/templates/extraction/spark/extraction_es_to_file.yaml.j2

Troubleshoot extraction

You can view templated topologies in the path configured in resources.manager.data.file.root_path + resources.punchlines_dir. You can view extraction results in the path configured in resources.archives_dir.

Compatibility

Plugin version / Punch version 6.4.1 6.4.2 6.4.3 6.4.4 6.4.5
1.2.1 included yes yes yes yes
1.2.4 yes included yes yes yes
1.2.5 yes yes included included included

Legend

  • yes : compatible
  • no : not compatible
  • included : included in Deployer & Standalone binaries