Skip to content

CephFs Distributed FileSystem

CephFS Rationale

The CephFS distributed file system is a feature delivered on top of the Ceph objects storage pools.

It provides your platform with a shared storage that is useful for storing and sharing data among all the platform components using standard posix interfaces. Examples are:

  • storing punchplatform topologies resource files
  • saving extracted data from the object storage or from elasticsearch. In turn that data is then easily accessible to in turn make them easily accessible to remote applications

In contrast to the traditional NFS mounted filesystems, the CephFs is scalable, resilient and require only commodity local disks.

CephFS configuration overview

CephFS instances are deployed using the punchplatform deployer. Refer to the punchplatform_deployment.settings file documentation for instructions on setting up a cephFs cluster.

Each Ceph Filesystem relies on 2 undelying object pools, each deployed by the punchplatform deployer: - a data object pool that will contain the content of the files stored in the filesystem - a metadata object pool that will hold the directory structure and the metadata associated to the stored files

CephFS instances can be mounted either as a kernel-level mount (by a linux admin account), or as a user-level partition (linux FUSE). Only the second method is supported by the punchplatform deployer, because it provides easier configuration.

For mounting partitions to client applications, a client ceph keyring is required by the mount command line. This keyring will be deployed on the client nodes by the punchpaltform deployer on every required client node.

These are typically the storm or spark node from where read/write access to the filesystem is required.

CephFS deployment steps

To deploy a CephFs instance, you must declare several items in the ceph section of the punchplatform_deployment.settings file. Here follows a complete example.

  • Configure two or more [metadata servers] (MDS) nodes. I.e:
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
"ceph": {
 "clusters": {
   "main": {

       [...]

       "metadataservers": {
           "ceph1": {
               "id": 0,
               "production_address": "ceph1.prod"
           },
           "ceph2": {
               "id": 1,
               "production_address": "ceph2.prod"
           }
       },         

       [...]
  • Declare two objects pools that will be dedicated to this CephFS instance. One of them must be of type 'replicated ', and will be used for directories structure/files metadata storage. The other can be of any type (replicated or erasure-coded) and will contain the data.

    For example :

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
"ceph": {
    "clusters": {
        "main": {

            [...]

            "pools": {

                [...]

                "mytenant-fsmeta": {
                    "type": "replicated",
                    "pg_num": 32,
                    "pgp_num": 32,
                    "replication_factor" : 3
                },
                "mytenant-fsdata": {
                    "type": "erasure-coded",
                    "pg_num": 64,
                    "pgp_num": 64
                }

                [...]
  • Declare the CephFS instance:
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
"ceph": {
  "clusters": {
      "main": {

          [...]

          "filesystems" : {
            "mytenant-fs": {
              "metadata_pool" : "mytenant-fsmeta",
              "data_pool" : "mytenant-fsdata"
            }
          },
  • Declare the storm cluster(s) whose worker nodes require to access the CephFS. I.e:
1
2
3
4
5
6
"ceph": {
  "clusters": {
      "main": {
          [...]
          "storm_clusters_clients" : ["main"]
          [...]
  • Declare the admin/operator/shiva nodes that will need to access the CephFS. I.e:
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
"ceph": {
  "clusters": {
      "main": {
          [...]
           "admins": [
              "admserver1",
              "admserver2",
              "shivaserver1",
              "shivaserver2"
          ],

You must now automate the mounting of the mount points on the each server that will need to read/write data into the Ceph filesystem. This is explained in the Auto-mounting CephFS at boot time section.

Mounting CephFS partition using FUSE

To mount a CephFS partition, with the default linux configuration, you need a linux sudoer account. The mount command is the following :

1
$ ceph-fuse  <mount point> -c <ceph cluster configuration> 

When mounting on a client node, the mount operation is normally conducted using the client ceph configuration and keyring, deployed by the punchplatform deployer. This does not allow administrative operations in the ceph cluster, but does allow to mount a CephFS partition:

1
2
$ sudo mkdir /data/somepartition
$ sudo ceph-fuse  /data/somepartition -c /etc/ceph/main-client.conf --id client

When mounting on a Punchplatform operator node, i.e. a ceph admin node, the mount operation is normally conducted using the admin ceph configuration and keyring :

1
2
$ sudo mkdir /data/somepartition
$ sudo ceph-fuse  /data/somepartition -c /etc/ceph/main.conf 

Auto-mounting CephFS at boot time

In order to automatically mount the required mount points at boot time, the privileged user can configure an entry in /etc/fstab.

The following example is for a typical client node :

1
2
#DEVICE PATH       TYPE      OPTIONS
none    /mnt/mytenant-cephfs  fuse.ceph ceph.id=client,ceph.conf=/etc/ceph/main-client.conf,_netdev,defaults  0 0

The following example is for a typical admin/operator node :

1
2
#DEVICE PATH       TYPE      OPTIONS
none    /mnt/mytenant-cephfs  fuse.ceph ceph.id=admin,ceph.conf=/etc/ceph/main.conf,_netdev,defaults  0 0

Warning

  • do not forget to create the mount point directory beforehand !
  • ensure that user/groups ids are consistent on all the servers that mount the same filesystem, so that the access restriction have the same meaning on all these servers.

If you want more insights, please refer to the CephFs documentation