Skip to content

Resource Manager

Abstract

The Resource manager is a Gateway service that allows a user to manage data inside a distant storage. The user can upload, download, list, delete, move, copy or register data files using a REST client through the /resources Gateway's endpoint.

image

The Resource manager is divided into two types of backends :

  • metadata storage backend
  • data storage backend

How it works

A REST client can perform multiple actions from the Resource Manager :

  • Upload a resource
  • Download a resource
  • Copy a resource
  • Move a resource
  • Delete a resource
  • List resources
  • Register an external resource

Every action is performed for a dedicated tenant defined by the gateway.

Example : List all resources of mytenant

curl -L -X GET 'http://localhost:4242/v1/mytenant/resources/list?simplify=true'

{
  "metadata": [
    {
      "@timestamp": "2020-10-09T12:36:39.107Z",
      "name": "datasets/dataset_generator",
      "version": 2
    },
    {
      "@timestamp": "2020-10-09T12:37:44.785Z",
      "name": "datasets/batch_input",
      "version": 0
    }
  ],
  "name": "*",
  "message": "List resources",
  "tenant": "mytenant"
}

Metadata Storage

The metadata storage is used to store information about the uploaded data.

A metadata is basically a key:value model containing diverse information :

key value
schema Version of the current metadata model
name Name of the uploaded data.The name can be a resource path or a simple name. It may contain a file extension but it has no effect on data content. Can be different from the storage.url of the data inside the data storage
timestamp Time of the last data update or creation
version Version of the uploaded data. This number increases when the same data is uploaded multiple times
tenant Tenant of the uploaded data. Each data is possessed by one tenant
size Byte length of the uploaded data
storage.type Type of the data storage of the uploaded data. For example, if file is the storage type, it means the data is contained inside a filesystem
storage.url Real location of the uploaded data inside the data storage
storage.encoding Optional. Data encoding inside the data storage
storage.compression Optional. Version of the uploaded data. This number increases when the same data is uploaded multiple times
storage.data Optional. Version of the uploaded data. This number increases when the same data is uploaded multiple times
properties Optional. Custom Map. The user can put additional key-values in metadata. These properties are useful to filter requested data for research purposes

Info

Check how to configure metadata storage for deployment: deployment settings documentation

Elasticsearch Metadata Storage

The Elasticsearch metadata storage type uses an index to store metadata.

Configuration :

  manager:
    metadata:
      elasticsearch:
        - hosts:
            - "localhost:9200"
          index_name: "resources-metadata"

The metadata index name is prefixed by the Punch Gateway's tenant name to ensure the resource partitioning.

The Elasticsearch metadata is a document where each model key is a field of the document.

Example :

{
  "schema": "V1",
  "name": "/my/resource",
  "timestamp": "1602230389 ",
  "version": "2",
  "tenant": "mytenant",
  "size": "2048",
  "storage": {
    "type": "file",
    "url": "/data/resources/mytenant/my_resource.hjson"
  },
  "properties": {
    "author": "bob",
    "editor": "vim"
  }
}

Data Storage

The data storage is used to store the resource content itself.

Info

Check how to configure data storage for deployment: deployment settings documentation

File Data Storage

The file data storage type uses a filesystem to store the resources.

Configuration :

  manager:
    data:
      file:
        - root_path: "/tmp/punchplatform/manager/resources"

Every data will be stored inside :

  • <root_path>/<resource_name>

The url of a resource is then a filesystem path where the resource version is injected. For the first upload of a resource, the url will be :

  • <root_path>/0/resource

Info

The name of the resource can be a path : my/resource.txt will be store inside <root_path>/my/0/resource.txt

Metadata Transparency

Metadata transparency is the key concept of the Resource Manager.

Each client action results in an automatic operation on metadata without any additional action from this client.

Every new uploaded data or new uploaded version of the same data is resulting in a metadata generation. No action is needed from the client : the metadata is automatically generated and pushed inside the metadata backend storage.

In the same way, any request to list, download, delete, copy or move a resource will automatically look for the resource metadata first to obtain the resource information and then proceed to the requested action from the client.

Client action Data
Upload The data is uploaded then a resource metadata is automatically created
Download The resource metadata are automatically requested then the data is returned using the resource url inside the metadata
Delete The resource metadata are automatically requested then the data is removed using the resource url inside the metadata. The resource metadata is finally removed too
List The resource metadata are returned
Copy The resource metadata are automatically requested then the data is uploaded to the new location
Move The resource metadata are automatically requested then the data is copied to the new location. Finally, the original data is removed and the resource metadata is deleted
Register The resource metadata is pushed inside the metadata storage