Persistence, Backup and Restore

Persistence allows individual members and whole clusters to recover data by persisting map entries, JCache data, and streaming job snapshots on disk. Members can use persisted data to recover from a planned shutdown (including rolling upgrades), a sudden cluster-wide crash, or a single member failure.

This topic assumes that you know about Persistence in Hazelcast. To learn about Persistence, see the Platform documentation.

The Operator supports two options for enabling Persistence: PVC and HostPath. We recommend using PVC, so the examples on this page use this option. If you need to use HostPath, see HostPath Support for Persistence.

Enabling Persistence

Before you can back up and restore Hazelcast data, you need to enable Persistence in the Hazelcast resource.

  1. Create the Hazelcast resource:

    apiVersion: hazelcast.com/v1alpha1
    kind: Hazelcast
    metadata:
      name: hazelcast
    spec:
      clusterSize: 3
      repository: 'docker.io/hazelcast/hazelcast-enterprise'
      version: '5.1.1-slim'
      licenseKeySecret: hazelcast-license-key
      persistence:
        baseDir: "/data/hot-restart/"  (1)
        clusterDataRecoveryPolicy: "FullRecoveryOnly"  (2)
        pvc:
          accessModes: ["ReadWriteOnce"]
          requestStorage: 20Gi  (3)
    1 Base directory of the backup data.
    2 Cluster recovery policy.
    3 Size of the PersistentVolumeClaim (PVC) where Hazelcast data is persisted.
    Make sure to calculate the total disk space that you will use. The total used disk space may be larger than the size of in-memory data, depending on how many hot backups you take.
  2. Apply the resource:

    kubectl apply -f ./hazelcast.yaml
  3. Check that Hazelcast members are ready:

    kubectl get pods -l app.kubernetes.io/instance=hazelcast
    NAME          READY   STATUS    RESTARTS   AGE
    hazelcast-0   1/1     Running   0          2m
    hazelcast-1   1/1     Running   0          1m
    hazelcast-2   1/1     Running   0          1m
  4. Check that Hazelcast PVCs are created:

    kubectl get pvc -l app.kubernetes.io/instance=hazelcast
    NAME                                  STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS   AGE
    hot-restart-persistence-hazelcast-0   Bound    pvc-116b4084-a436-4462-b413-511b77df307b   20Gi       RWO            standard       2m
    hot-restart-persistence-hazelcast-1   Bound    pvc-a7711b6b-dcbf-45cb-9577-8ce6b1892f2f   20Gi       RWO            standard       1m
    hot-restart-persistence-hazelcast-2   Bound    pvc-64314d82-da7a-4e38-bd2d-e770a63dc4e8   20Gi       RWO            standard       1m

Triggering a Single Hot Backup

After Persistence is enabled for the cluster, you can trigger hot backups, using a HotBackup resource.

  1. Create the HotBackup resource:

    apiVersion: hazelcast.com/v1alpha1
    kind: HotBackup
    metadata:
      name: hot-backup
    spec:
      hazelcastResourceName: hazelcast
  2. Apply the resource:

    kubectl apply -f ./hot-backup.yaml

Scheduling Hot Backups

You can schedule regular hot backups, using the spec.schedule field of a HotBackup resource.

  1. Create the HotBackup resource with a schedule field:

    apiVersion: hazelcast.com/v1alpha1
    kind: HotBackup
    metadata:
      name: hot-backup
    spec:
      hazelcastResourceName: hazelcast
      schedule: "* 0-23/6 * * *"

    This field takes a valid cron expression. For example:

    30 10 * * *

    On 10:30 AM every day

    0, 0, 1,15,25 * *

    On 1st, 15th and 25th each month, midnight

    @monthly

    Once a month, first of the month, midnight

    For the full list of the supported expression, please check the library documentation.

  2. Apply the resource:

    kubectl apply -f ./hot-backup-scheduled.yaml

Restoring from Hot Backups

To restore a cluster from hot backups, you can reapply the Hazelcast resource, and it will have access to the PVCs that contain the persisted data.

  1. Delete the Hazelcast cluster:

    kubectl delete hazelcast hazelcast
  2. Check that the PVCs are still available:

    kubectl get pvc -l app.kubernetes.io/instance=hazelcast
    NAME                                  STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS   AGE
    hot-restart-persistence-hazelcast-0   Bound    pvc-116b4084-a436-4462-b413-511b77df307b   20Gi       RWO            standard       6m
    hot-restart-persistence-hazelcast-1   Bound    pvc-a7711b6b-dcbf-45cb-9577-8ce6b1892f2f   20Gi       RWO            standard       5m
    hot-restart-persistence-hazelcast-2   Bound    pvc-64314d82-da7a-4e38-bd2d-e770a63dc4e8   20Gi       RWO            standard       5m
  3. Reapply the Hazelcast resource:

    kubectl apply -f ./hazelcast-persistence.yaml
  4. See that the same PVCs are bound to the new pods:

    kubectl get pods -l app.kubernetes.io/instance=hazelcast -ojsonpath='{.items[*].spec.volumes[?(@.name=="hot-restart-persistence")].persistentVolumeClaim.claimName}'

    You should see something like the following:

    hot-restart-persistence-hazelcast-0 hot-restart-persistence-hazelcast-1 hot-restart-persistence-hazelcast-2

Data Recovery Timeout

To choose a data recovery timeout, you can use dataRecoveryTimeout. The field takes an integer value representing the timeout in seconds and uses this value to set validation-timeout-seconds, data-load-timeout-seconds Hazelcast Persistence options.

apiVersion: hazelcast.com/v1alpha1
kind: Hazelcast
metadata:
  name: hazelcast
spec:
  clusterSize: 3
  repository: 'docker.io/hazelcast/hazelcast-enterprise'
  version: '5.1.1-slim'
  licenseKeySecret: hazelcast-license-key
  persistence:
    baseDir: "/data/hot-restart/"
    clusterDataRecoveryPolicy: "FullRecoveryOnly"
    dataRecoveryTimeout: 600
    pvc:
      accessModes: ["ReadWriteOnce"]
      requestStorage: 20Gi

Choosing a Cluster Recovery Policy

To decide how a cluster should behave when one or more members cannot rejoin after a cluster-wide restart, you can define one of the following cluster recovery policies. The Operator supports all the policies in the Hazelcast Platform cluster-data-recovery-policy configuration options. For complete descriptions and advice on choosing a policy, see the Platform documentation.

FullRecoveryOnly

Does not allow partial start of the cluster.

PartialRecoveryMostRecent

Allows partial start with the members that have most recent partition table.

PartialRecoveryMostComplete

Allows partial start with the member that have most complete partition table.

Configuring Force-Start

If you use the FullRecoveryOnly policy, you can configure the Operator to detect failed Hazelcast members and automatically trigger a force-start. The Operator will trigger a force-start only if the cluster is in a PASSIVE state.

The cluster loses all persisted data after a force-start.

Enable autoForceStart:

apiVersion: hazelcast.com/v1alpha1
kind: Hazelcast
metadata:
  name: hazelcast
spec:
  clusterSize: 3
  repository: 'docker.io/hazelcast/hazelcast-enterprise'
  version: '5.1.1-slim'
  licenseKeySecret: hazelcast-license-key
  persistence:
    baseDir: "/data/hot-restart/"
    autoForceStart: true

HostPath Support for Persistence

You can also use HostPath to enable persistence.

HostPath support is discouraged for the production environments for the reasons mentioned in the Kubernetes documentation
HostPath support expects the size of the cluster to be equal to the number of Kubernetes nodes and pods are distributed to the nodes equally. You can manage how pods are distributed among nodes by setting the topologySpreadContraints field, which is described in Scheduling Hazelcast Pods.
  1. Create the Hazelcast resource with the clusterSize equal to the number of Kubernetes nodes and give the proper topologySpreadConstraints:

    apiVersion: hazelcast.com/v1alpha1
    kind: Hazelcast
    metadata:
      name: hazelcast
    spec:
      clusterSize: 3
      repository: 'docker.io/hazelcast/hazelcast-enterprise'
      version: '5.1.1-slim'
      licenseKeySecret: hazelcast-license-key
      persistence:
        baseDir: "/data/hot-restart/"
        clusterDataRecoveryPolicy: "FullRecoveryOnly"
        hostPath: "/tmp/hazelcast"
      scheduling:
        topologySpreadConstraints:
          - maxSkew: 1
            topologyKey: kubernetes.io/hostname
            whenUnsatisfiable: DoNotSchedule
            labelSelector:
              matchLabels:
                app.kubernetes.io/instance: hazelcast
  2. Apply the Hazelcast resource

    kubectl apply -f ./hazelcast-persistence-hostpath.yaml