Skip to content

punchplatform-archive-client.sh

Synopsis

punchplatform-archive-client.sh [command options]

Description

This command can be used to list topics from an archiving system, get a status of a pool or a topic or delete a topic.

Be careful: delete-topic command deletes data from archiving storage and all its references in the indexation cluster.

Command

  • list-topics :

    • List 'topics' (sets of tuples-containing objects indexed by PunchPlatform).
  • list-objects :

    • List objects contained in a topic (bloom filters available)
  • dump-object :

    • Print uncompressed object from an object-id.
  • pool-status :

    • Displays usage and heal statistics of a topic within a cluster.
  • topic-status :

    • This displays detailed data contained in the topic. Warning ; this command could take up to a few minutes execution time, due to topic index data aggregation.

Options

For each command, the following options are available:

For list-topics :

  • --cluster (mandatory) :

    • Connection/address string to locate the object storage cluster. e.g.: 'ceph_configuration:///myconfig.conf' or 'file:///tmp/mypooldata'
  • --details :

    • add detailed index information on topic content (contained partitions, files, tuples, bytes and timescope)
  • --es-cluster-name (mandatory if --es-cluster-url not specified) :

    • name of Elasticsearch cluster used to index stored data
  • --es-cluster-url (mandatory if --es-cluster-name not specified) :

    • url of Elasticsearch cluster used to index stored data
  • --es-index (mandatory) :

    • Elasticsearch index name used to index stored data
  • --credentials :

    • Credentials for Basic ElasticSearch authentication if using OpenDistro.
  • --token :

    • Token for ElasticSearch authentication if using OpenDistro.
  • --tokentype (mandatory if --token is specified) :

    • Token type for ElasticSearch authentication if using OpenDistro.
  • --from-date (mandatory) :

    • ISO-8601 datetime of start of time scope to extract or to remove (e.g. '2016-12-12T03:04:05+01:00')
  • --no-progress-bar :

    • prevent display of a progress bar to stdout (useful if you need to redirect/post-process the output)
    • Default: false
  • --pool (mandatory) :

    • Object pool name within object storage cluster. e.g.: 'data'
  • --to-date (mandatory) :

    • ISO-8601 datetime of end of time scope to extract or to remove (e.g. '2016-12-19T03:04:05+01:00')
  • --user :

    • User name used to access to Ceph cluster.
    • Default: admin

For list-objects :

  • --cluster (mandatory) :

    • Connection/address string to locate the object storage cluster. e.g.: 'ceph_configuration:///myconfig.conf' or 'file:///tmp/mypooldata'
  • --es-cluster-name (mandatory if --es-cluster-url not specified) :

    • name of Elasticsearch cluster used to index stored data
  • --es-cluster-url (mandatory if --es-cluster-name not specified) :

    • url of Elasticsearch cluster used to index stored data
  • --es-index (mandatory) :

    • Elasticsearch index name used to index stored data
  • --credentials :

    • Credentials for Basic ElasticSearch authentication if using OpenDistro.
  • --token :

    • Token for ElasticSearch authentication if using OpenDistro.
  • --tokentype (mandatory if --token is specified) :

    • Token type for ElasticSearch authentication if using OpenDistro.
  • --from-date (mandatory) :

    • ISO-8601 datetime of start of time scope to extract or to remove (e.g. '2016-12-12T03:04:05+01:00')
  • --no-progress-bar :

    • prevent display of a progress bar to stdout (useful if you need to redirect/post-process the output)
    • Default: false
  • --pool (mandatory) :

    • Object pool name within object storage cluster. e.g.: 'data'
  • --topic (mandatory) :

    • Object topic name within object storage cluster. e.g.: 'apache'
  • --to-date (mandatory) :

    • ISO-8601 datetime of end of time scope to extract or to remove (e.g. '2016-12-19T03:04:05+01:00')
  • --user :

    • User name used to access to Ceph cluster.
    • Default: admin
  • --match :

    • Value to match in a bloom filter field.

For pool-status :

  • --cluster (mandatory) :

    • Connection/address string to locate the object storage cluster. e.g.: 'ceph_configuration:///myconfig.conf' or 'file:///tmp/mypooldata>'
  • --no-progress-bar :

    • prevent display of a progress bar to stdout (useful if you need to redirect/post-process the output)
    • Default: false
  • --pool (mandatory) :

    • Object pool name within object storage cluster. e.g.: 'data'
  • --user :

    • User name used to access to Ceph cluster.
    • Default: admin

For pool-status :

  • --cluster (mandatory) :

    • Connection/address string to locate the object storage cluster. e.g.: 'ceph_configuration:///myconfig.conf' or 'file:///tmp/mypooldata>'
  • --no-progress-bar :

    • prevent display of a progress bar to stdout (useful if you need to redirect/post-process the output)
    • Default: false
  • --pool (mandatory) :

    • Object pool name within object storage cluster. e.g.: 'data'
  • --object-id (mandatory) :

    • Object id (ex : httpd/2019.12.19/puncharchive-httpd-E8M9Hm8BuW5Odu0CcGQx-0-500937.csv.gz)
  • --user :

    • User name used to access to Ceph cluster.
    • Default: admin
  • --es-cluster-name (mandatory if --es-cluster-url not specified) :

    • name of Elasticsearch cluster used to index stored data
  • --es-cluster-url (mandatory if --es-cluster-name not specified) :

    • url of Elasticsearch cluster used to index stored data
  • --es-index (mandatory) :

    • Elasticsearch index name used to index stored data

For topic-status :

  • --cluster (mandatory) :

    • Connection/address string to locate the object storage cluster. e.g.: 'ceph_configuration:///myconfig.conf' or 'file:///tmp/mypooldata>'
  • --es-cluster-name (mandatory if --es-cluster-url not specified) :

    • name of Elasticsearch cluster used to index stored data
  • --es-cluster-url (mandatory if --es-cluster-name not specified) :

    • url of Elasticsearch cluster used to index stored data
  • --es-index (mandatory) :

    • Elasticsearch index name used to index stored data
  • --credentials :

    • Credentials for Basic ElasticSearch authentication if using OpenDistro.
  • --token :

    • Token for ElasticSearch authentication if using OpenDistro.
  • --tokentype (mandatory if --token is specified) :

    • Token type for ElasticSearch authentication if using OpenDistro.
  • --from-date (mandatory) :

    • ISO-8601 datetime of start of time scope to extract or to remove (e.g. '2016-12-12T03:04:05+01:00')
  • --no-progress-bar :

    • prevent display of a progress bar to stdout (useful if you need to redirect/post-process the output)
    • Default: false
  • --partition :

    • when specified, only statistics for the provided partition with this id will be output.
  • --pool (mandatory) :

    • Object pool name within object storage cluster. e.g.: 'data'
  • --reindex :

    • Use this option to re-index a topic (aggregates computing).
    • Default: false
  • --since-days :

    • When specified (with --reindex option enabled), only last specified days will be reindexed.
  • --to-date (mandatory) :

    • ISO-8601 datetime of end of time scope to extract or to remove (e.g. '2016-12-19T03:04:05+01:00')
  • --topic (mandatory) :

    • name of the topic (see 'list-topics' command).
  • --user :

    • User name used to access to Ceph cluster.
    • Default: admin

Examples

  • To get a status of pool 'mytenant-data' from a Ceph cluster named 'main':
1
2
punchplatform-archive-client.sh pool-status \
    --pool mytenant-data --cluster ceph_configuration:///etc/ceph/main.conf
  • To list topics of pool 'mytenant-data' from a Ceph cluster 'main' in January 2018:
1
2
3
4
punchplatform-archive-client.sh list-topics \
    --cluster ceph_configuration:///etc/ceph/main.conf --pool mytenant-data \
    --es-cluster-name es_search --es-index mytenant-archive \
    --from-date 2018-01-01T00:00:00+01:00 --to-date 2018-01-31T23:59:59+01:00
  • To list objects of pool 'mytenant-data' from a filesystem in January 2018 matching a specific ip using bloom filters:
1
2
3
4
5
6
punchplatform-archive-client.sh list-objects  \
    --cluster file:///tmp/storage \
    --pool mytenant-data --topic apache \
    --from-date 2018-01-01T00:00:00+01:00 --to-date 2018-01-31T23:59:59+01:00 \
    --es-cluster-name es_search --es-index mytenant-archive \
    --match 128.78.18.47
  • To print details (number of tuples, size, ...) of topic, you are free to use --details option:
1
2
3
punchplatform-archive-client.sh list-topics \
    --cluster ceph_configuration:///etc/ceph/main.conf --pool mytenant-data --es-cluster-name es_search
    --from-date 2018-01-01T00:00:00+01:00 --to-date 2018-01-31T23:59:59+01:00 --details
  • To get a status of a particular topic 'apache' in January 2018:
1
2
3
4
punchplatform-archive-client.sh topic-status \
    --cluster ceph_configuration:///etc/ceph/main.conf --pool mytenant-data --topic apache \
    --es-cluster-name es_search --es-index mytenant-archive \
    --from-date 2018-01-01T00:00:00+01:00 --to-date 2018-01-31T23:59:59+01:00 

Files

  • punchplatform.properties:
    • This json file defines the punchplatform settings. It is used by punchplatform-kafka-topics.sh primarily to know the broker list.

See also

punchplatform-env