A newer version of Hazelcast Operator is available.

View latest

Persistence, Backup and Restore

Persistence allows individual members and whole clusters to recover data by persisting map entries, JCache data, and streaming job snapshots on disk. Members can use persisted data to recover from a planned shutdown (including rolling upgrades), a sudden cluster-wide crash, or a single member failure.

This topic assumes that you know about Persistence in Hazelcast. To learn about Persistence, see the Platform documentation.

The Operator supports two options for enabling Persistence: PVC and HostPath. We recommend using PVC, so the examples on this page use this option. If you need to use HostPath, see HostPath Support for Persistence.

Before you back up and restore Hazelcast data, you need to enable Persistence in the Hazelcast resource. There are two options for backup and restore; External or Local (Default option)

Enabling Local Persistence

  1. Create the Hazelcast resource:

    apiVersion: hazelcast.com/v1alpha1
    kind: Hazelcast
    metadata:
      name: hazelcast
    spec:
      clusterSize: 3
      repository: 'docker.io/hazelcast/hazelcast-enterprise'
      version: '5.1.3-slim'
      licenseKeySecret: hazelcast-license-key
      persistence:
        baseDir: "/data/hot-restart/"  (1)
        clusterDataRecoveryPolicy: "FullRecoveryOnly"  (2)
        pvc:
          accessModes: ["ReadWriteOnce"]
          requestStorage: 20Gi  (3)
    yamlCopy
    1 Base directory of the backup data.
    2 Cluster recovery policy.
    3 Size of the PersistentVolumeClaim (PVC) where Hazelcast data is persisted.
    Note Make sure to calculate the total disk space that you will use. The total used disk space may be larger than the size of in-memory data, depending on how many backups you take.
  2. Apply the resource:

    kubectl apply -f ./hazelcast-persistence.yaml
    bashCopy
  3. Check that Hazelcast members are ready:

    kubectl get pods -l app.kubernetes.io/instance=hazelcast
    bashCopy
    NAME          READY   STATUS    RESTARTS   AGE
    hazelcast-0   1/1     Running   0          2m
    hazelcast-1   1/1     Running   0          1m
    hazelcast-2   1/1     Running   0          1m
    bashCopy
  4. Check that Hazelcast PVCs are created:

    kubectl get pvc -l app.kubernetes.io/instance=hazelcast
    bashCopy
    NAME                                  STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS   AGE
    hot-restart-persistence-hazelcast-0   Bound    pvc-116b4084-a436-4462-b413-511b77df307b   20Gi       RWO            standard       2m
    hot-restart-persistence-hazelcast-1   Bound    pvc-a7711b6b-dcbf-45cb-9577-8ce6b1892f2f   20Gi       RWO            standard       1m
    hot-restart-persistence-hazelcast-2   Bound    pvc-64314d82-da7a-4e38-bd2d-e770a63dc4e8   20Gi       RWO            standard       1m
    bashCopy

Triggering Local Backups

After Persistence is enabled for the cluster, you can trigger backups, using a HotBackup resource.

  1. Create the HotBackup resource:

    apiVersion: hazelcast.com/v1alpha1
    kind: HotBackup
    metadata:
      name: hot-backup
    spec:
      hazelcastResourceName: hazelcast
    yamlCopy
  2. Apply the resource:

    kubectl apply -f ./hot-backup.yaml
    bashCopy

Scheduling Backups

You can schedule regular backups, using the spec.schedule field of a HotBackup resource.

  1. Create the HotBackup resource with a schedule field:

    apiVersion: hazelcast.com/v1alpha1
    kind: HotBackup
    metadata:
      name: hot-backup
    spec:
      hazelcastResourceName: hazelcast
      schedule: "* 0-23/6 * * *"
    yamlCopy

    This field takes a valid cron expression. For example:

    30 10 * * *

    On 10:30 AM every day

    0, 0, 1,15,25 * *

    On 1st, 15th and 25th each month, midnight

    @monthly

    Once a month, first of the month, midnight

    For the full list of the supported expression, please check the library documentation.

  2. Apply the resource:

    kubectl apply -f ./hot-backup-scheduled.yaml
    bashCopy

Checking the Status of a Backup

To check the status of a backup, run the following command:

kubectl get hotbackup hot-backup
bashCopy

The status of the backup is displayed in the output.

NAME         STATUS
hot-backup   Success
bashCopy

Restoring from Local Backups

To restore a cluster from local backups, you can directly reapply the Hazelcast resource, and it will have access to the PVCs that contain the persisted data.

  1. Delete the Hazelcast cluster:

    kubectl delete hazelcast hazelcast
    bashCopy
  2. Check that the PVCs are still available:

    kubectl get pvc -l app.kubernetes.io/instance=hazelcast
    bashCopy
    NAME                                  STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS   AGE
    hot-restart-persistence-hazelcast-0   Bound    pvc-116b4084-a436-4462-b413-511b77df307b   20Gi       RWO            standard       6m
    hot-restart-persistence-hazelcast-1   Bound    pvc-a7711b6b-dcbf-45cb-9577-8ce6b1892f2f   20Gi       RWO            standard       5m
    hot-restart-persistence-hazelcast-2   Bound    pvc-64314d82-da7a-4e38-bd2d-e770a63dc4e8   20Gi       RWO            standard       5m
    bashCopy
  3. Reapply the Hazelcast resource:

    kubectl apply -f ./hazelcast-persistence.yaml
    bashCopy
  4. See that the same PVCs are bound to the new pods:

    kubectl get pods -l app.kubernetes.io/instance=hazelcast -ojsonpath='{.items[*].spec.volumes[?(@.name=="hot-restart-persistence")].persistentVolumeClaim.claimName}'
    bashCopy

    You should see something like the following:

    hot-restart-persistence-hazelcast-0 hot-restart-persistence-hazelcast-1 hot-restart-persistence-hazelcast-2
    bashCopy

Enabling External Persistence

In some cases, keeping the data only at PVC and restoring the data from it is not enough. For example, moving data between two Kubernetes clusters. You can use external storages so they will become portable.

For enabling external persistence, you need two additional configurations. One is backupType. The other one is agent configuration. When external persistence is used, the Backup Agent container will be deployed with the Hazelcast container in the same Pod.

  1. Create the Hazelcast resource:

    apiVersion: hazelcast.com/v1alpha1
    kind: Hazelcast
    metadata:
      name: hazelcast
    spec:
      clusterSize: 3
      repository: 'docker.io/hazelcast/hazelcast-enterprise'
      version: '5.1.3-slim'
      licenseKeySecret: hazelcast-license-key
      persistence:
        backupType: "External" (1)
        baseDir: "/data/hot-restart/"
        clusterDataRecoveryPolicy: "FullRecoveryOnly"
        pvc:
          accessModes: ["ReadWriteOnce"]
          requestStorage: 20Gi
      agent: (2)
        repository: hazelcast/platform-operator-agent
    yamlCopy
    1 The backup type for where the backup data will be stored. It is either Local or External. Default value is Local.
    2 The agent which is responsible for backing data up into the external storage. The agent configuration is optional. If you set the backupType to External and do not pass the agent configuration, the operator directly use the latest stable version of the agent.
  2. Apply the resource:

    kubectl apply -f ./hazelcast-persistence-agent.yaml
    bashCopy
  3. Check that Hazelcast members are ready:

    kubectl get pods -l app.kubernetes.io/instance=hazelcast
    bashCopy
    NAME          READY   STATUS    RESTARTS   AGE
    hazelcast-0   2/2     Running   0          2m55s
    hazelcast-1   2/2     Running   0          2m15s
    hazelcast-2   2/2     Running   0          99s
    bashCopy
  4. Check that Hazelcast PVCs are created:

    kubectl get pvc -l app.kubernetes.io/instance=hazelcast
    bashCopy
    NAME                                  STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS   AGE
    hot-restart-persistence-hazelcast-0   Bound    pvc-116b4084-a436-4462-b413-511b77df307b   20Gi       RWO            standard       2m
    hot-restart-persistence-hazelcast-1   Bound    pvc-a7711b6b-dcbf-45cb-9577-8ce6b1892f2f   20Gi       RWO            standard       1m
    hot-restart-persistence-hazelcast-2   Bound    pvc-64314d82-da7a-4e38-bd2d-e770a63dc4e8   20Gi       RWO            standard       1m
    bashCopy

Triggering External Backups

To trigger an external backup, you need to configure a bucket URI and a secret to tell Hazelcast where to store backup data and how to authenticate.

  1. Create the HotBackup resource:

    apiVersion: hazelcast.com/v1alpha1
    kind: HotBackup
    metadata:
      name: hot-backup
    spec:
      hazelcastResourceName: hazelcast
      bucketURI: "s3://operator-backup" (1)
      secret: "br-secret-s3" (2)
    yamlCopy
    1 The bucket URI where backup data will be stored in
    2 Name of the secret with credentials for accessing the given Bucket URI.
  2. Create the secret:

    • AWS

    • GCP

    • Azure

    See AWS Session to learn about authentication procedure.

    kubectl create secret generic <secret-name> --from-literal=region=<region> \
    	--from-literal=access-key-id=<access-key-id> \
    	--from-literal=secret-access-key=<secret-access-key>
    bashCopy

    See Application Default Credentials to learn about authentication procedure.

    kubectl create secret generic <secret-name> --from-file=google-credentials-path=<service_account_json_file>
    bashCopy

    See Azure Storage Account Keys to learn about authentication procedure.

    kubectl create secret generic <secret-name> \
    	--from-literal=storage-account=<storage-account> \
    	--from-literal=storage-key=<storage-key>
    bashCopy
  3. Apply the resource:

    kubectl apply -f ./hot-backup-agent.yaml
    bashCopy

Scheduling External Backups

You can schedule external backups, using the spec.schedule field of a HotBackup resource.

  1. Create the HotBackup resource with a schedule field:

    apiVersion: hazelcast.com/v1alpha1
    kind: HotBackup
    metadata:
      name: hot-backup
    spec:
      hazelcastResourceName: hazelcast
      bucketURI: "s3://operator-backup"
      secret: "br-secret-s3"
      schedule: "* 0-23/6 * * *"
    yamlCopy

    This field takes a valid cron expression. For example:

    30 10 * * *

    On 10:30 AM every day

    0, 0, 1,15,25 * *

    On 1st, 15th and 25th each month, midnight

    @monthly

    Once a month, first of the month, midnight

    For the full list of the supported expression, please check the library documentation.

  2. Apply the resource:

    kubectl apply -f ./hot-backup-agent-scheduled.yaml
    bashCopy

Checking the Status of a Backup

To check the status of a backup, run the following command:

kubectl get hotbackup hot-backup
bashCopy

The status of the backup is displayed in the output.

NAME         STATUS
hot-backup   Success
bashCopy

Restoring from External Backups

To restore a cluster from external backups, you must enable external restore by adding restore configuration under the persistence. Similar with External Persistence, you may pass the agent configuration. If not, default value of the agent configuration will be used. When external restore mechanism is used, the Restore Agent container will be deployed with the Hazelcast container in the same Pod. The agent will start as an initContainer before the Hazelcast container.

  1. Delete the Hazelcast cluster:

    kubectl delete hazelcast hazelcast
    bashCopy
  2. Recreate the Hazelcast resource:

    apiVersion: hazelcast.com/v1alpha1
    kind: Hazelcast
    metadata:
      name: hazelcast
    spec:
      clusterSize: 3
      repository: 'docker.io/hazelcast/hazelcast-enterprise'
      version: '5.1.3-slim'
      licenseKeySecret: hazelcast-license-key
      persistence:
        baseDir: "/data/hot-restart/"
        clusterDataRecoveryPolicy: "FullRecoveryOnly"
        pvc:
          accessModes: ["ReadWriteOnce"]
          requestStorage: 20Gi
        restore:
          bucketURI: "s3://operator-backup?prefix=hazelcast/2022-06-08-17-01-20/" (1)
          secret: br-secret-s3 (2)
      agent: (3)
        repository: hazelcast/platform-operator-agent
    yamlCopy
    1 The bucket URI where backup data will be restored from.
    2 Name of the secret with credentials for accessing the given Bucket URI.
    3 The agent which is responsible for restoring data from the external storage. The agent configuration is optional. If you give restore under the persistence and do not pass the agent configuration, the operator directly use the latest stable version of the agent.
  3. If you have not created the secret, you must create the secret using the same way with the Triggering External Backups.

  4. Apply the Hazelcast resource:

    kubectl apply -f ./hazelcast-persistence-restore.yaml
    bashCopy
  5. See that the same PVCs are bound to the new pods:

    kubectl get pods -l app.kubernetes.io/instance=hazelcast -ojsonpath='{.items[*].spec.volumes[?(@.name=="hot-restart-persistence")].persistentVolumeClaim.claimName}'
    bashCopy

    You should see something like the following:

    hot-restart-persistence-hazelcast-0 hot-restart-persistence-hazelcast-1 hot-restart-persistence-hazelcast-2
    bashCopy

Data Recovery Timeout

To choose a data recovery timeout, you can use dataRecoveryTimeout. The field takes an integer value representing the timeout in seconds and uses this value to set validation-timeout-seconds, data-load-timeout-seconds Hazelcast Persistence options.

apiVersion: hazelcast.com/v1alpha1
kind: Hazelcast
metadata:
  name: hazelcast
spec:
  clusterSize: 3
  repository: 'docker.io/hazelcast/hazelcast-enterprise'
  version: '5.1.3-slim'
  licenseKeySecret: hazelcast-license-key
  persistence:
    baseDir: "/data/hot-restart/"
    clusterDataRecoveryPolicy: "FullRecoveryOnly"
    dataRecoveryTimeout: 600
    pvc:
      accessModes: ["ReadWriteOnce"]
      requestStorage: 20Gi
yamlCopy

Choosing a Cluster Recovery Policy

To decide how a cluster should behave when one or more members cannot rejoin after a cluster-wide restart, you can define one of the following cluster recovery policies. The Operator supports all the policies in the Hazelcast Platform cluster-data-recovery-policy configuration options. For complete descriptions and advice on choosing a policy, see the Platform documentation.

FullRecoveryOnly

Does not allow partial start of the cluster.

PartialRecoveryMostRecent

Allows partial start with the members that have most recent partition table.

PartialRecoveryMostComplete

Allows partial start with the member that have most complete partition table.

Configuring Force-Start

If you use the FullRecoveryOnly policy, you can configure the Operator to detect failed Hazelcast members and automatically trigger a force-start. The Operator will trigger a force-start only if the cluster is in a PASSIVE state.

Warning The cluster loses all persisted data after a force-start.

Enable autoForceStart:

apiVersion: hazelcast.com/v1alpha1
kind: Hazelcast
metadata:
  name: hazelcast
spec:
  clusterSize: 3
  repository: 'docker.io/hazelcast/hazelcast-enterprise'
  version: '5.1.3-slim'
  licenseKeySecret: hazelcast-license-key
  persistence:
    baseDir: "/data/hot-restart/"
    autoForceStart: true
yamlCopy

HostPath Support for Persistence

You can also use HostPath to enable persistence.

Warning HostPath support is discouraged for the production environments for the reasons mentioned in the Kubernetes documentation
Warning HostPath support expects the size of the cluster to be equal to the number of Kubernetes nodes and pods are distributed to the nodes equally. You can manage how pods are distributed among nodes by setting the topologySpreadContraints field, which is described in Scheduling Hazelcast Pods.
  1. Create the Hazelcast resource with the clusterSize equal to the number of Kubernetes nodes and give the proper topologySpreadConstraints:

    apiVersion: hazelcast.com/v1alpha1
    kind: Hazelcast
    metadata:
      name: hazelcast
    spec:
      clusterSize: 3
      repository: 'docker.io/hazelcast/hazelcast-enterprise'
      version: '5.1.3-slim'
      licenseKeySecret: hazelcast-license-key
      persistence:
        baseDir: "/data/hot-restart/"
        clusterDataRecoveryPolicy: "FullRecoveryOnly"
        hostPath: "/tmp/hazelcast"
      scheduling:
        topologySpreadConstraints:
          - maxSkew: 1
            topologyKey: kubernetes.io/hostname
            whenUnsatisfiable: DoNotSchedule
            labelSelector:
              matchLabels:
                app.kubernetes.io/instance: hazelcast
    yamlCopy
  2. Apply the Hazelcast resource

    kubectl apply -f ./hazelcast-persistence-hostpath.yaml
    bashCopy