Persistence, Backup and Restore

Persistence allows individual members and whole clusters to recover data by persisting map entries, JCache data, and streaming job snapshots on disk. Members can use persisted data to recover from a planned shutdown (including rolling upgrades), a sudden cluster-wide crash, or a single member failure.

This topic assumes that you know about Persistence in Hazelcast. To learn about Persistence, see the Platform documentation.

The Operator supports two options for enabling Persistence: PVC and HostPath. We recommend using PVC, so the examples on this page use this option. If you need to use HostPath, see HostPath Support for Persistence.

Before you back up and restore Hazelcast data, you need to enable Persistence in the Hazelcast resource. There are two options for backup and restore; External or Local (Default option)

Enabling Local Persistence

Create the Hazelcast resource:

apiVersion: hazelcast.com/v1alpha1
kind: Hazelcast
metadata:
  name: hazelcast
spec:
  clusterSize: 3
  repository: 'docker.io/hazelcast/hazelcast-enterprise'
  version: '5.1.3-slim'
  licenseKeySecret: hazelcast-license-key
  persistence:
    baseDir: "/data/hot-restart/"  (1)
    clusterDataRecoveryPolicy: "FullRecoveryOnly"  (2)
    pvc:
      accessModes: ["ReadWriteOnce"]
      requestStorage: 20Gi  (3)

Base directory of the backup data.

Cluster recovery policy.

Size of the PersistentVolumeClaim (PVC) where Hazelcast data is persisted.

Make sure to calculate the total disk space that you will use. The total used disk space may be larger than the size of in-memory data, depending on how many backups you take.

Apply the resource:

kubectl apply -f ./hazelcast-persistence.yaml

Check that Hazelcast members are ready:

kubectl get pods -l app.kubernetes.io/instance=hazelcast

NAME          READY   STATUS    RESTARTS   AGE
hazelcast-0   1/1     Running   0          2m
hazelcast-1   1/1     Running   0          1m
hazelcast-2   1/1     Running   0          1m

Check that Hazelcast PVCs are created:

kubectl get pvc -l app.kubernetes.io/instance=hazelcast

NAME                                  STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS   AGE
hot-restart-persistence-hazelcast-0   Bound    pvc-116b4084-a436-4462-b413-511b77df307b   20Gi       RWO            standard       2m
hot-restart-persistence-hazelcast-1   Bound    pvc-a7711b6b-dcbf-45cb-9577-8ce6b1892f2f   20Gi       RWO            standard       1m
hot-restart-persistence-hazelcast-2   Bound    pvc-64314d82-da7a-4e38-bd2d-e770a63dc4e8   20Gi       RWO            standard       1m

Triggering Local Backups

After Persistence is enabled for the cluster, you can trigger backups, using a HotBackup resource.

Create the HotBackup resource:

apiVersion: hazelcast.com/v1alpha1
kind: HotBackup
metadata:
  name: hot-backup
spec:
  hazelcastResourceName: hazelcast

Apply the resource:
```
kubectl apply -f ./hot-backup.yaml
```

Scheduling Backups

You can schedule regular backups, using the spec.schedule field of a HotBackup resource.

Create the HotBackup resource with a schedule field:
```
apiVersion: hazelcast.com/v1alpha1
kind: HotBackup
metadata:
  name: hot-backup
spec:
  hazelcastResourceName: hazelcast
  schedule: "* 0-23/6 * * *"
```
This field takes a valid cron expression. For example:

30 10 * * *

On 10:30 AM every day

0, 0, 1,15,25 * *

On 1st, 15th and 25th each month, midnight

@monthly

Once a month, first of the month, midnight

For the full list of the supported expression, please check the library documentation.

Apply the resource:

kubectl apply -f ./hot-backup-scheduled.yaml

Checking the Status of a Backup

To check the status of a backup, run the following command:

kubectl get hotbackup hot-backup

The status of the backup is displayed in the output.

NAME         STATUS
hot-backup   Success

Restoring from Local Backups

To restore a cluster from local backups, you can directly reapply the Hazelcast resource, and it will have access to the PVCs that contain the persisted data.

Delete the Hazelcast cluster:
```
kubectl delete hazelcast hazelcast
```

Check that the PVCs are still available:

kubectl get pvc -l app.kubernetes.io/instance=hazelcast

NAME                                  STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS   AGE
hot-restart-persistence-hazelcast-0   Bound    pvc-116b4084-a436-4462-b413-511b77df307b   20Gi       RWO            standard       6m
hot-restart-persistence-hazelcast-1   Bound    pvc-a7711b6b-dcbf-45cb-9577-8ce6b1892f2f   20Gi       RWO            standard       5m
hot-restart-persistence-hazelcast-2   Bound    pvc-64314d82-da7a-4e38-bd2d-e770a63dc4e8   20Gi       RWO            standard       5m

Reapply the Hazelcast resource:

kubectl apply -f ./hazelcast-persistence.yaml

See that the same PVCs are bound to the new pods:

kubectl get pods -l app.kubernetes.io/instance=hazelcast -ojsonpath='{.items[*].spec.volumes[?(@.name=="hot-restart-persistence")].persistentVolumeClaim.claimName}'

You should see something like the following:

hot-restart-persistence-hazelcast-0 hot-restart-persistence-hazelcast-1 hot-restart-persistence-hazelcast-2

Enabling External Persistence

In some cases, keeping the data only at PVC and restoring the data from it is not enough. For example, moving data between two Kubernetes clusters. You can use external storages so they will become portable.

For enabling external persistence, you need two additional configurations. One is backupType. The other one is agent configuration. When external persistence is used, the Backup Agent container will be deployed with the Hazelcast container in the same Pod.

Create the Hazelcast resource:

apiVersion: hazelcast.com/v1alpha1
kind: Hazelcast
metadata:
  name: hazelcast
spec:
  clusterSize: 3
  repository: 'docker.io/hazelcast/hazelcast-enterprise'
  version: '5.1.3-slim'
  licenseKeySecret: hazelcast-license-key
  persistence:
    backupType: "External" (1)
    baseDir: "/data/hot-restart/"
    clusterDataRecoveryPolicy: "FullRecoveryOnly"
    pvc:
      accessModes: ["ReadWriteOnce"]
      requestStorage: 20Gi
  agent: (2)
    repository: hazelcast/platform-operator-agent

1	The backup type for where the backup data will be stored. It is either `Local` or `External`. Default value is `Local`.
2	The agent which is responsible for backing data up into the external storage. The agent configuration is optional. If you set the `backupType` to `External` and do not pass the agent configuration, the operator directly use the latest stable version of the agent.

Apply the resource:

kubectl apply -f ./hazelcast-persistence-agent.yaml

Check that Hazelcast members are ready:

kubectl get pods -l app.kubernetes.io/instance=hazelcast

NAME          READY   STATUS    RESTARTS   AGE
hazelcast-0   2/2     Running   0          2m55s
hazelcast-1   2/2     Running   0          2m15s
hazelcast-2   2/2     Running   0          99s

Check that Hazelcast PVCs are created:

kubectl get pvc -l app.kubernetes.io/instance=hazelcast

NAME                                  STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS   AGE
hot-restart-persistence-hazelcast-0   Bound    pvc-116b4084-a436-4462-b413-511b77df307b   20Gi       RWO            standard       2m
hot-restart-persistence-hazelcast-1   Bound    pvc-a7711b6b-dcbf-45cb-9577-8ce6b1892f2f   20Gi       RWO            standard       1m
hot-restart-persistence-hazelcast-2   Bound    pvc-64314d82-da7a-4e38-bd2d-e770a63dc4e8   20Gi       RWO            standard       1m

Triggering External Backups

To trigger an external backup, you need to configure a bucket URI and a secret to tell Hazelcast where to store backup data and how to authenticate.

Create the HotBackup resource:

apiVersion: hazelcast.com/v1alpha1
kind: HotBackup
metadata:
  name: hot-backup
spec:
  hazelcastResourceName: hazelcast
  bucketURI: "s3://operator-backup" (1)
  secret: "br-secret-s3" (2)

1	The bucket URI where backup data will be stored in
2	Name of the secret with credentials for accessing the given Bucket URI.

Create the secret:

AWS
GCP
Azure

See AWS Session to learn about authentication procedure.

kubectl create secret generic <secret-name> --from-literal=region=<region> \
	--from-literal=access-key-id=<access-key-id> \
	--from-literal=secret-access-key=<secret-access-key>

See Application Default Credentials to learn about authentication procedure.

kubectl create secret generic <secret-name> --from-file=google-credentials-path=<service_account_json_file>

See Azure Storage Account Keys to learn about authentication procedure.

kubectl create secret generic <secret-name> \
	--from-literal=storage-account=<storage-account> \
	--from-literal=storage-key=<storage-key>

Apply the resource:

kubectl apply -f ./hot-backup-agent.yaml

Scheduling External Backups

You can schedule external backups, using the spec.schedule field of a HotBackup resource.

Create the HotBackup resource with a schedule field:
```
apiVersion: hazelcast.com/v1alpha1
kind: HotBackup
metadata:
  name: hot-backup
spec:
  hazelcastResourceName: hazelcast
  bucketURI: "s3://operator-backup"
  secret: "br-secret-s3"
  schedule: "* 0-23/6 * * *"
```
This field takes a valid cron expression. For example:

30 10 * * *

On 10:30 AM every day

0, 0, 1,15,25 * *

On 1st, 15th and 25th each month, midnight

@monthly

Once a month, first of the month, midnight

For the full list of the supported expression, please check the library documentation.

Apply the resource:

kubectl apply -f ./hot-backup-agent-scheduled.yaml

Checking the Status of a Backup

To check the status of a backup, run the following command:

kubectl get hotbackup hot-backup

The status of the backup is displayed in the output.

NAME         STATUS
hot-backup   Success

Restoring from External Backups

To restore a cluster from external backups, you must enable external restore by adding restore configuration under the persistence. Similar with External Persistence, you may pass the agent configuration. If not, default value of the agent configuration will be used. When external restore mechanism is used, the Restore Agent container will be deployed with the Hazelcast container in the same Pod. The agent will start as an initContainer before the Hazelcast container.

Delete the Hazelcast cluster:
```
kubectl delete hazelcast hazelcast
```

Recreate the Hazelcast resource:

apiVersion: hazelcast.com/v1alpha1
kind: Hazelcast
metadata:
  name: hazelcast
spec:
  clusterSize: 3
  repository: 'docker.io/hazelcast/hazelcast-enterprise'
  version: '5.1.3-slim'
  licenseKeySecret: hazelcast-license-key
  persistence:
    baseDir: "/data/hot-restart/"
    clusterDataRecoveryPolicy: "FullRecoveryOnly"
    pvc:
      accessModes: ["ReadWriteOnce"]
      requestStorage: 20Gi
    restore:
      bucketURI: "s3://operator-backup?prefix=hazelcast/2022-06-08-17-01-20/" (1)
      secret: br-secret-s3 (2)
  agent: (3)
    repository: hazelcast/platform-operator-agent

1	The bucket URI where backup data will be restored from.
2	Name of the secret with credentials for accessing the given Bucket URI.
3	The agent which is responsible for restoring data from the external storage. The agent configuration is optional. If you give `restore` under the `persistence` and do not pass the agent configuration, the operator directly use the latest stable version of the agent.

If you have not created the secret, you must create the secret using the same way with the Triggering External Backups.

Apply the Hazelcast resource:

kubectl apply -f ./hazelcast-persistence-restore.yaml

See that the same PVCs are bound to the new pods:

kubectl get pods -l app.kubernetes.io/instance=hazelcast -ojsonpath='{.items[*].spec.volumes[?(@.name=="hot-restart-persistence")].persistentVolumeClaim.claimName}'

You should see something like the following:

hot-restart-persistence-hazelcast-0 hot-restart-persistence-hazelcast-1 hot-restart-persistence-hazelcast-2

Data Recovery Timeout

To choose a data recovery timeout, you can use dataRecoveryTimeout. The field takes an integer value representing the timeout in seconds and uses this value to set validation-timeout-seconds, data-load-timeout-seconds Hazelcast Persistence options.

apiVersion: hazelcast.com/v1alpha1
kind: Hazelcast
metadata:
  name: hazelcast
spec:
  clusterSize: 3
  repository: 'docker.io/hazelcast/hazelcast-enterprise'
  version: '5.1.3-slim'
  licenseKeySecret: hazelcast-license-key
  persistence:
    baseDir: "/data/hot-restart/"
    clusterDataRecoveryPolicy: "FullRecoveryOnly"
    dataRecoveryTimeout: 600
    pvc:
      accessModes: ["ReadWriteOnce"]
      requestStorage: 20Gi

Choosing a Cluster Recovery Policy

To decide how a cluster should behave when one or more members cannot rejoin after a cluster-wide restart, you can define one of the following cluster recovery policies. The Operator supports all the policies in the Hazelcast Platform cluster-data-recovery-policy configuration options. For complete descriptions and advice on choosing a policy, see the Platform documentation.

FullRecoveryOnly

Does not allow partial start of the cluster.

PartialRecoveryMostRecent

Allows partial start with the members that have most recent partition table.

PartialRecoveryMostComplete

Allows partial start with the member that have most complete partition table.

Configuring Force-Start

If you use the FullRecoveryOnly policy, you can configure the Operator to detect failed Hazelcast members and automatically trigger a force-start. The Operator will trigger a force-start only if the cluster is in a PASSIVE state.

The cluster loses all persisted data after a force-start.

Enable autoForceStart:

apiVersion: hazelcast.com/v1alpha1
kind: Hazelcast
metadata:
  name: hazelcast
spec:
  clusterSize: 3
  repository: 'docker.io/hazelcast/hazelcast-enterprise'
  version: '5.1.3-slim'
  licenseKeySecret: hazelcast-license-key
  persistence:
    baseDir: "/data/hot-restart/"
    autoForceStart: true

HostPath Support for Persistence

You can also use HostPath to enable persistence.

HostPath support is discouraged for the production environments for the reasons mentioned in the Kubernetes documentation

HostPath support expects the size of the cluster to be equal to the number of Kubernetes nodes and pods are distributed to the nodes equally. You can manage how pods are distributed among nodes by setting the topologySpreadContraints field, which is described in Scheduling Hazelcast Pods.

Create the Hazelcast resource with the clusterSize equal to the number of Kubernetes nodes and give the proper topologySpreadConstraints:

apiVersion: hazelcast.com/v1alpha1
kind: Hazelcast
metadata:
  name: hazelcast
spec:
  clusterSize: 3
  repository: 'docker.io/hazelcast/hazelcast-enterprise'
  version: '5.1.3-slim'
  licenseKeySecret: hazelcast-license-key
  persistence:
    baseDir: "/data/hot-restart/"
    clusterDataRecoveryPolicy: "FullRecoveryOnly"
    hostPath: "/tmp/hazelcast"
  scheduling:
    topologySpreadConstraints:
      - maxSkew: 1
        topologyKey: kubernetes.io/hostname
        whenUnsatisfiable: DoNotSchedule
        labelSelector:
          matchLabels:
            app.kubernetes.io/instance: hazelcast

Apply the Hazelcast resource

kubectl apply -f ./hazelcast-persistence-hostpath.yaml

Persistence, Backup and Restore

Enabling Local Persistence

Triggering Local Backups

Scheduling Backups

Checking the Status of a Backup

Restoring from Local Backups

Enabling External Persistence

Triggering External Backups

Scheduling External Backups

Checking the Status of a Backup

Restoring from External Backups

Data Recovery Timeout

Choosing a Cluster Recovery Policy

Configuring Force-Start

HostPath Support for Persistence

Send us your feedback

Help and support