Skip to content

punchplatform-objects-storage.sh

Synopsis

punchplatform-objects-storage.sh [command options]

Description

This command can be used to list topics from an archiving system, get a status of a pool or a topic, delete a topic, or extract data from a topic.

Be careful: delete-topic command deletes data from archiving storage and all its references in the indexation cluster.

Command extract-scope can be used to extract data from archives but does not offer resiliency (if command is stopped, extraction will not be re-launched from last failure). This command is adequate to extract small volumes of data. For large volumes, please use topologies (in particular the ArchiveSpout).

Command

  • delete-topic :

    • Purges all tuples files objects, partitions and topic index data for specified topic.
  • dump-object (WARNING : CANNOT BE UNDONE) :

    • Decode and dump tuples documents from a stored object.
  • extract-scope :

    • Decode and dump tuples documents from a topic, optionally restricting extraction to a time scope using the stored tuple timestamp field.
  • list-topics :

    • List 'topics' (sets of tuples-containing objects indexed by PunchPlatform).
  • pool-status :

    • Displays usage and heal statistics of a topic within a cluster.
  • topic-status :

    • This displays detailed data contained in the topic. Warning ; this command could take up to a few minutes execution time, due to topic index data aggregation.

Options

For each command, the following options are available:

For delete-topic :

  • --cluster (mandatory) :

    • Connection/address string to locate the object storage cluster. e.g: 'ceph_configuration:///myconfig.conf' or 'ceph:myclustername' or 'file:///tmp/mypooldata'
  • --do-it-without-question :

    • removes need for interactive confirmation of deletion
    • Default:false
  • --elasticsearch-index-prefix :

    • optional prefix added to Elasticsearch index name used to index stored data
  • --elasticsearch-index-suffix :

    • optional suffix added to Elasticsearch index name used to index stored data
  • --elasticsearch-cluster-name (mandatory if --elasticsearch-cluster-url not specified) :

    • name of Elasticsearch cluster used to index stored data
  • --elasticsearch-cluster-url (mandatory if --elasticsearch-cluster-name not specified) :

    • url of Elasticsearch cluster used to index stored data
  • --credentials :

    • Credentials for Basic ElasticSearch authentication if using OpenDistro.
  • --token :

    • Token for ElasticSearch authentication if using OpenDistro.
  • --tokentype (mandatory if --token is specified) :

    • Token type for ElasticSearch authentication if using OpenDistro.
  • --from-date (mandatory) :

    • ISO-8601 datetime of start of time scope to extract or to remove (e.g. '2016-12-12T03:04:05+01:00')
  • --no-progress-bar :

    • prevent display of a progress bar to stdout (useful if you need to redirect/post-process the output)
    • Default: false
  • --pool (mandatory) :

    • Object pool name within object storage cluster. e.g.: 'data'
  • --to-date (mandatory) :

    • ISO-8601 datetime of end of time scope to extract or to remove (e.g. '2016-12-19T03:04:05+01:00')
  • --topic (mandatory) :

    • name of the topic (see 'list-topics' command).
  • --user :

    • User name used to access to Ceph cluster.
    • Default: admin

For dump-object :

  • --cluster (mandatory) :

    • Connection/address string to locate the object storage cluster. e.g.: 'ceph_configuration:///myconfig.conf' or 'ceph:myclustername' or 'file:///tmp/mypooldata'
  • --no-progress-bar :

    • prevent display of a progress bar to stdout (useful if you need to redirect/post-process the output) - Default: false
  • --object (mandatory) :

    • Name of object in object storage pool.
  • --pool (mandatory) :

    • Object pool name within object storage cluster. e.g.: 'data'
  • --user :

    • User name used to access to Ceph cluster.
    • Default: admin

For extract-scope :

  • --cluster (mandatory) :

    • Connection/address string to locate the object storage cluster. e.g.: \'ceph_configuration:///myconfig.conf\' or \'ceph:myclustername\' or 'file:///tmp/mypooldata>'
  • --elasticsearch-cluster-name (mandatory if \--elasticsearch-cluster-url not specified) :

    • name of Elasticsearch cluster used to index stored data
  • --elasticsearch-cluster-url (mandatory if \--elasticsearch-cluster-name not specified) :

    • url of Elasticsearch cluster used to index stored data
  • --credentials :

    • Credentials for Basic ElasticSearch authentication if using OpenDistro.
  • --token :

    • Token for ElasticSearch authentication if using OpenDistro.
  • --tokentype (mandatory if --token is specified) :

    • Token type for ElasticSearch authentication if using OpenDistro.
  • --from-date (mandatory) :

    • ISO-8601 datetime of start of time scope to extract or to remove (e.g. \'2016-12-12T03:04:05+01:00\')
  • --into :

    • If a target root directory is provided, files will be extracted in a directory structure 'targetRoot\<Year>\<Month>\<Day>\<TopicName>-\<First_Doc_Date>-\<Last_Doc_Date>-<PartitionId>-<FileNum>\' based on earliest document in the file instead of beeing decoded to stdout)
  • --key-aliases :

    • If a key alias, a keystore (--keystore) and a target directory (--into) are specified, data will be deciphered before extraction.
  • --keystore :

    • If a keystore, a key alias (--key-alias) and a target directory (--into) are specified, data will be deciphered before extraction.
  • --no-progress-bar :

    • prevent display of a progress bar to stdout (useful if you need to redirect/post-process the output)
    • Default: false
  • --no-summary :

    • do not display the time-scope selection aggregated statistics. - Default: false
  • --older-than-days :

    • removes all data older than specified days
  • --pool (mandatory) :

    • Object pool name within object storage cluster. e.g.: 'data'
  • --summary-only

    • do not actually process files content, but only print out aggregates/stats of selected tuples files. - Default: false
  • --to-date (mandatory) :

    • ISO-8601 datetime of end of time scope to extract or to remove (e.g. '2016-12-19T03:04:05+01:00')
  • --topic (mandatory) :

    • name of the topic (see 'list-topics' command).
  • --user :

    • User name used to access to Ceph cluster.
    • Default: admin

For list-topics :

  • --cluster (mandatory) :

    • Connection/address string to locate the object storage cluster. e.g.: 'ceph_configuration:///myconfig.conf' or 'ceph:myclustername' or 'file:///tmp/mypooldata'
  • --details :

    • add detailed index information on topic content (contained partitions, files, tuples, bytes and timescope)
  • --elasticsearch-cluster-name (mandatory if --elasticsearch-cluster-url not specified) :

    • name of Elasticsearch cluster used to index stored data
  • --elasticsearch-cluster-url (mandatory if --elasticsearch-cluster-name not specified) :

    • url of Elasticsearch cluster used to index stored data
  • --credentials :

    • Credentials for Basic ElasticSearch authentication if using OpenDistro.
  • --token :

    • Token for ElasticSearch authentication if using OpenDistro.
  • --tokentype (mandatory if --token is specified) :

    • Token type for ElasticSearch authentication if using OpenDistro.
  • --from-date (mandatory) :

    • ISO-8601 datetime of start of time scope to extract or to remove (e.g. '2016-12-12T03:04:05+01:00')
  • --no-progress-bar :

    • prevent display of a progress bar to stdout (useful if you need to redirect/post-process the output)
    • Default: false
  • --pool (mandatory) :

    • Object pool name within object storage cluster. e.g.: 'data'
  • --to-date (mandatory) :

    • ISO-8601 datetime of end of time scope to extract or to remove (e.g. '2016-12-19T03:04:05+01:00')
  • --user :

    • User name used to access to Ceph cluster.
    • Default: admin

For pool-status :

  • --cluster (mandatory) :

    • Connection/address string to locate the object storage cluster. e.g.: 'ceph_configuration:///myconfig.conf' or 'ceph:myclustername' or 'file:///tmp/mypooldata>'
  • --no-progress-bar :

    • prevent display of a progress bar to stdout (useful if you need to redirect/post-process the output)
    • Default: false
  • --pool (mandatory) :

    • Object pool name within object storage cluster. e.g.: 'data'
  • --user :

    • User name used to access to Ceph cluster.
    • Default: admin

For topic-status :

  • --cluster (mandatory) :

    • Connection/address string to locate the object storage cluster. e.g.: 'ceph_configuration:///myconfig.conf' or 'ceph:myclustername' or 'file:///tmp/mypooldata>'
  • --elasticsearch-cluster-name (mandatory if --elasticsearch-cluster-url not specified) :

    • name of Elasticsearch cluster used to index stored data
  • --elasticsearch-cluster-url (mandatory if --elasticsearch-cluster-name not specified) :

    • url of Elasticsearch cluster used to index stored data
  • --credentials :

    • Credentials for Basic ElasticSearch authentication if using OpenDistro.
  • --token :

    • Token for ElasticSearch authentication if using OpenDistro.
  • --tokentype (mandatory if --token is specified) :

    • Token type for ElasticSearch authentication if using OpenDistro.
  • --from-date (mandatory) :

    • ISO-8601 datetime of start of time scope to extract or to remove (e.g. '2016-12-12T03:04:05+01:00')
  • --no-progress-bar :

    • prevent display of a progress bar to stdout (useful if you need to redirect/post-process the output)
    • Default: false
  • --partition :

    • when specified, only statistics for the provided partition with this id will be output.
  • --pool (mandatory) :

    • Object pool name within object storage cluster. e.g.: 'data'
  • --reindex :

    • Use this option to re-index a topic (aggregates computing).
    • Default: false
  • --since-days :

    • When specified (with --reindex option enabled), only last specified days will be reindexed.
  • --to-date (mandatory) :

    • ISO-8601 datetime of end of time scope to extract or to remove (e.g. '2016-12-19T03:04:05+01:00')
  • --topic (mandatory) :

    • name of the topic (see 'list-topics' command).
  • --user :

    • User name used to access to Ceph cluster.
    • Default: admin

Examples

  • To get a status of pool 'mytenant-data' from a Ceph cluster named 'main':
1
punchplatform-objects-storage.sh pool-status --pool mytenant-data --cluster ceph:main
  • To list topics of pool 'mytenant-data' from a Ceph cluster 'main' in January 2018:
1
punchplatform-objects-storage.sh list-topics --cluster ceph:main --pool mytenant-data \--elasticsearch-cluster-name es\_search --from-date 2018-01-01T00:00:00+01:00 --to-date 2018-01-31T23:59:59+01:00
  • To print details (number of tuples, size, ...) of topic, you are free to use --details option:
1
punchplatform-objects-storage.sh list-topics --cluster ceph:main --pool mytenant-data \--elasticsearch-cluster-name es\_search--from-date 2018-01-01T00:00:00+01:00 --to-date 2018-01-31T23:59:59+01:00 --details
  • To get a status of a particular topic 'apache' in January 2018:
1
punchplatform-objects-storage.sh topic-status --cluster ceph:main --pool mytenant-data --elasticsearch-cluster-name es_search --from-date 2018-01-01T00:00:00+01:00 --to-date 2018-01-31T23:59:59+01:00 --topic apache
  • To dump an object 'all-logs-parsed-archive-0-3391' from pool 'mytenant-data':
1
punchplatform-objects-storage.sh dump-object --cluster ceph:main --pool mytenant-data --object all-logs-parsed-archive-0-3391
  • To perform a small extraction of topic 'apache' on 2nd April 2018 at 7.37 a.m in stdout:
1
punchplatform-objects-storage.sh extract-scope --cluster ceph:main --pool mytenant-data --elasticsearch-cluster-name es_search --from-date 2018-04-02T07:37:00+01:00 --to-date 2018-04-02T07:37:59+01:00 --topic apache
  • To extract these data in directory /my/extraction/directory, please use --into option:
1
punchplatform-objects-storage.sh extract-scope --cluster ceph:main --pool mytenant-data --elasticsearch-cluster-name es_search --from-date 2018-04-02T07:37:00+01:00 --to-date 2018-04-02T07:37:59+01:00 --topic apache --into /my/extraction/directory
  • To delete topic 'apache' on 14th June 2018 (be careful, it cannot be undone):
1
punchplatform-objects-storage.sh delete-topic --cluster ceph:main --pool mytenant-data --elasticsearch-cluster-name es_search --from-date 2018-06-14T00:00:00+01:00 --to-date 2018-06-14T23:59:59+01:00 --topic apache

Files

  • punchplatform.properties:
    • This json file defines the punchplatform settings. It is used by punchplatform-kafka-topics.sh primarily to know the broker list.

See also

punchplatform-env