Resource Manager¶

Abstract

The Resource manager is a Gateway service that allows a user to manage data inside a distant storage. The user can upload, download, list, delete, move, copy or register data files using a REST client through the /resources Gateway's endpoint.

The Resource manager is divided into two types of backends :

metadata storage backend
data storage backend

How it works¶

A REST client can perform multiple actions from the Resource Manager :

Upload a resource
Download a resource
Copy a resource
Move a resource
Delete a resource
List resources
Register an external resource

Every action is performed for a dedicated tenant defined by the gateway.

Example : List all resources of mytenant

curl -L -X GET 'http://localhost:4242/v1/mytenant/resources/list?simplify=true'

{
  "metadata": [
    {
      "@timestamp": "2020-10-09T12:36:39.107Z",
      "name": "datasets/dataset_generator",
      "version": 2
    },
    {
      "@timestamp": "2020-10-09T12:37:44.785Z",
      "name": "datasets/batch_input",
      "version": 0
    }
  ],
  "name": "*",
  "message": "List resources",
  "tenant": "mytenant"
}

Metadata Storage¶

The metadata storage is used to store information about the uploaded data.

A metadata is basically a key:value model containing diverse information :

key	value
`schema`	Version of the current metadata model
`name`	Name of the uploaded data.The name can be a resource path or a simple name. It may contain a file extension but it has no effect on data content. Can be different from the `storage.url` of the data inside the data storage
`timestamp`	Time of the last data update or creation
`version`	Version of the uploaded data. This number increases when the same data is uploaded multiple times
`tenant`	Tenant of the uploaded data. Each data is possessed by one tenant
`size`	Byte length of the uploaded data
`storage.type`	Type of the data storage of the uploaded data. For example, if `file` is the storage type, it means the data is contained inside a filesystem
`storage.url`	Real location of the uploaded data inside the data storage
`storage.encoding`	Optional. Data encoding inside the data storage
`storage.compression`	Optional. Version of the uploaded data. This number increases when the same data is uploaded multiple times
`storage.data`	Optional. Version of the uploaded data. This number increases when the same data is uploaded multiple times
`properties`	Optional. Custom Map. The user can put additional key-values in metadata. These properties are useful to filter requested data for research purposes

Info

Check how to configure metadata storage for deployment: deployment settings documentation

Elasticsearch Metadata Storage¶

The Elasticsearch metadata storage type uses an index to store metadata.

Configuration :

  manager:
    metadata:
      elasticsearch:
        - hosts:
            - "localhost:9200"
          index_name: "resources-metadata"

The metadata index name is prefixed by the Punch Gateway's tenant name to ensure the resource partitioning.

The Elasticsearch metadata is a document where each model key is a field of the document.

Example :

{
  "schema": "V1",
  "name": "/my/resource",
  "timestamp": "1602230389 ",
  "version": "2",
  "tenant": "mytenant",
  "size": "2048",
  "storage": {
    "type": "file",
    "url": "/data/resources/mytenant/my_resource.hjson"
  },
  "properties": {
    "author": "bob",
    "editor": "vim"
  }
}

Data Storage¶

The data storage is used to store the resource content itself.

Info

Check how to configure data storage for deployment: deployment settings documentation

File Data Storage¶

The file data storage type uses a filesystem to store the resources.

Configuration :

  manager:
    data:
      file:
        - root_path: "/tmp/punchplatform/manager/resources"

Every data will be stored inside :

<root_path>/<resource_name>

The url of a resource is then a filesystem path where the resource version is injected. For the first upload of a resource, the url will be :

<root_path>/0/resource

Info

The name of the resource can be a path : my/resource.txt will be store inside <root_path>/my/0/resource.txt

Metadata Transparency¶

Metadata transparency is the key concept of the Resource Manager.

Each client action results in an automatic operation on metadata without any additional action from this client.

Every new uploaded data or new uploaded version of the same data is resulting in a metadata generation. No action is needed from the client : the metadata is automatically generated and pushed inside the metadata backend storage.

In the same way, any request to list, download, delete, copy or move a resource will automatically look for the resource metadata first to obtain the resource information and then proceed to the requested action from the client.

Client action	Data
Upload	The data is uploaded then a resource metadata is automatically created
Download	The resource metadata are automatically requested then the data is returned using the resource url inside the metadata
Delete	The resource metadata are automatically requested then the data is removed using the resource url inside the metadata. The resource metadata is finally removed too
List	The resource metadata are returned
Copy	The resource metadata are automatically requested then the data is uploaded to the new location
Move	The resource metadata are automatically requested then the data is copied to the new location. Finally, the original data is removed and the resource metadata is deleted
Register	The resource metadata is pushed inside the metadata storage