Persistence, backup and restore

Persistence allows individual members and whole clusters to recover data by persisting map entries, JCache data, and streaming job snapshots on disk. Members can use persisted data to recover from a planned shutdown (including rolling upgrades), a sudden cluster-wide crash, or a single member failure.

To learn more about persistence, see Persisting data on disk.

Backups can be either of the following:

Local: Local backups are kept in volume and never moved anywhere.
External: External backups are moved into buckets provided by the user.

For a worked example, see the Restore a cluster from cloud storage tutorial.

Enable persistence

Enabling Hazelcast persistence requires the following configuration.

apiVersion: hazelcast.com/v1alpha1
kind: Hazelcast
metadata:
  name: hazelcast
spec:
  clusterSize: 3
  repository: 'docker.io/hazelcast/hazelcast-enterprise'
  licenseKeySecretName: hazelcast-license-key
  persistence:
    clusterDataRecoveryPolicy: "FullRecoveryOnly"  (1)
    pvc:
      accessModes: ["ReadWriteOnce"]
      requestStorage: 20Gi  (2)
  agent: (3)
    repository: hazelcast/platform-operator-agent

1	Cluster recovery policy.
2	Size of the PersistentVolumeClaim (PVC) where Hazelcast data is persisted.
3	Agent responsible for moving data from the local storage to external buckets. The agent configuration is optional. If you enable `persistence` and do not pass the agent configuration, Operator uses the latest agent version that is compatible with its version.

If you want to enable persistence for a map or cache you must also enable it in the Map or Cache CRs by setting persistenceEnabled: true. For more information, see Configuring Map and Configuring Cache.

Make sure to calculate the total disk space that you will use. The total used disk space may be larger than the size of in-memory data, depending on how many backups you take.

Trigger local backups

You can take local backups using the HotBackup custom resource. Local backups are kept in volume and are not moved anywhere.

apiVersion: hazelcast.com/v1alpha1
kind: HotBackup
metadata:
  name: hot-backup
spec:
  hazelcastResourceName: hazelcast

Trigger external backups

In some cases, using a PVC is not enough. For example, if you need to move data between two Kubernetes clusters. For these cases, you can use external storage.

When persistence is enabled, Hazelcast pods start with a sidecar agent which uploads the backups into an external bucket you provide. For external storage, AWS S3, GCP Bucket and Azure Blob storage options are supported.

To trigger an external backup, you need to configure a bucket URI and a secret to tell Hazelcast where to store backup data and how to authenticate.

AWS
GCP
Azure

External backup in AWS S3:

apiVersion: hazelcast.com/v1alpha1
kind: HotBackup
metadata:
  name: hot-backup
spec:
  hazelcastResourceName: hazelcast
  bucketURI: "s3://bucket-name/path/to/folder" (1)
  secretName: "secret-aws-s3" (2)

For further information about accessing resources on different cloud providers, see Authorization methods to access cloud provider resources.

External backup in GCP Bucket:

apiVersion: hazelcast.com/v1alpha1
kind: HotBackup
metadata:
  name: hot-backup
spec:
  hazelcastResourceName: hazelcast
  bucketURI: "gs://bucket-name/path/to/folder" (1)
  secretName: "secret-gcp-bucket" (2)

For further information about accessing resources on different cloud providers, see Authorization methods to access cloud provider resources.

External backup in Azure Blob:

apiVersion: hazelcast.com/v1alpha1
kind: HotBackup
metadata:
  name: hot-backup
spec:
  hazelcastResourceName: hazelcast
  bucketURI: "azblob://bucket-name/path/to/folder" (1)
  secretName: "secret-azure-blob" (2)

For further information about accessing resources on different cloud providers, see Authorization methods to access cloud provider resources.

1	The bucket URI where backup data will be stored.
2	Name of the secret with credentials for accessing the given bucket URI.

Schedule backups

You can schedule backups using the schedule and hotBackupTemplate fields of the CronHotBackup resource. For more information about the CronHotBackup resource, see the API reference.

apiVersion: hazelcast.com/v1alpha1
kind: CronHotBackup
metadata:
  name: cron-hot-backup
spec:
  schedule: "* 0-23/6 * * *"
  hotBackupTemplate:
    spec:
      hazelcastResourceName: hazelcast

The schedule field takes a valid cron expression. For example, you can configure the following scheduled backups:

30 10 * * *

At 10:30 AM every day

0, 0, 1,15,25 * *

On 1st, 15th, and 25th of each month at midnight

@monthly

On the first day of each month at midnight

For a full list of supported expressions, see the library documentation.

Check the status of a backup

To check the status of a local backup, run the following command:

kubectl get hotbackup hot-backup

The status of the backup is displayed in the output.

NAME         STATUS
hot-backup   Success

Restore from local backups

To restore a cluster from local backups, you can reapply the Hazelcast resource, which gives the cluster access to the PVCs that contain the persisted data. This will restore the Hazelcast cluster from existing hot-restart folders.

To restore from local backups that you have taken using the HotBackup resource, give the HotBackup resource name in the restore configuration. For the restore to work correctly, make sure the status of the HotBackup resource is Success.

When this restore mechanism is used, the Restore Agent container is deployed with the Hazelcast container in the same Pod. The agent starts as an initContainer before the Hazelcast container.

apiVersion: hazelcast.com/v1alpha1
kind: Hazelcast
metadata:
  name: hazelcast (1)
spec:
  clusterSize: 3
  repository: 'docker.io/hazelcast/hazelcast-enterprise'
  licenseKeySecretName: hazelcast-license-key
  persistence:
    clusterDataRecoveryPolicy: "FullRecoveryOnly"
    pvc:
      accessModes: ["ReadWriteOnce"]
      requestStorage: 20Gi
    restore:
      hotBackupResourceName: hot-backup (2)
  agent: (3)
    repository: hazelcast/platform-operator-agent

1	Hazelcast custom resource name for both backup and restore CRs must be the same, otherwise the restore fails.
2	`HotBackup` resource name used for the restore. The backup folder name will be the name you provide here.
3	Agent responsible for restoring data from the local storage. The agent configuration is optional. If you give `restore` under `persistence` and do not pass the agent configuration, Operator uses the latest agent version that is compatible with its version.

You can use a local backup only once to restore a cluster. We recommend you backup externally if you need to persistently restore a backup across the clusters.

Restore from external backups

To restore a cluster from an external backup, you can either set up the bucket configuration or give the HotBackup resource name that you used to trigger the external backup. In either case, the backup is restored from the external bucket.

When this restore mechanism is used, the Restore Agent container is deployed with the Hazelcast container in the same Pod. The agent starts as an initContainer before the Hazelcast container.

If you have not created the secret, you must do so in the same way as in Triggering External Backups.

Bucket configuration
HotBackup resource name

apiVersion: hazelcast.com/v1alpha1
kind: Hazelcast
metadata:
  name: hazelcast
spec:
  clusterSize: 3
  repository: 'docker.io/hazelcast/hazelcast-enterprise'
  licenseKeySecretName: hazelcast-license-key
  persistence:
    clusterDataRecoveryPolicy: "FullRecoveryOnly"
    pvc:
      accessModes: ["ReadWriteOnce"]
      requestStorage: 20Gi
    restore:
      bucketConfig:
        bucketURI: "s3://operator-backup?prefix=hazelcast/2022-06-08-17-01-20/" (1)
        secretName: br-secret-s3 (2)
  agent: (3)
    repository: hazelcast/platform-operator-agent

1	Bucket URI where backup data will be restored from.
2	Name of the secret with credentials for accessing the given bucket URI.
3	Agent which is responsible for restoring data from the external storage. The agent configuration is optional. If you provide `restore` under `persistence` and do not pass the agent configuration, Operator uses the latest agent version that is compatible with its version.

apiVersion: hazelcast.com/v1alpha1
kind: Hazelcast
metadata:
  name: hazelcast (1)
spec:
  clusterSize: 3
  repository: 'docker.io/hazelcast/hazelcast-enterprise'
  licenseKeySecretName: hazelcast-license-key
  persistence:
    clusterDataRecoveryPolicy: "FullRecoveryOnly"
    pvc:
      accessModes: ["ReadWriteOnce"]
      requestStorage: 20Gi
    restore:
      hotBackupResourceName: hot-backup (2)
  agent: (3)
    repository: hazelcast/platform-operator-agent

1	`HotBackup` resource name used for the restore. The bucket URI and secret are taken from the `HotBackup` resource.
2	Agent responsible for restoring data from external storage. The agent configuration is optional. If you provide `restore` under `persistence` and do not pass the agent configuration, Operator uses the latest agent version that is compatible with its version.

Restore from a persistent volume

To restore from local backups that are in an existing persistent volume, configure localConfig.

apiVersion: hazelcast.com/v1alpha1
kind: Hazelcast
metadata:
  name: hazelcast (1)
spec:
  clusterSize: 3
  repository: 'docker.io/hazelcast/hazelcast-enterprise'
  licenseKeySecretName: hazelcast-license-key
  customConfigCmName: mapCustomConfig (2)
  persistence:
    pvc:
      accessModes: ["ReadWriteOnce"]
      requestStorage: 20Gi
    restore:
      localConfig:
        pvcNamePrefix: "hot-restart-persistence" (3)
        baseDir: "base" (4)
        backupDir: "backup" (5)
        backupFolder: "backup-1709045128548" (6)

1	To successfully attach existing PVCs to the newly created cluster: If the older cluster was created by Operator, the name of both the backup and restore Hazelcast custom resources must be the same. If the older cluster was created by the Hazelcast Helm Chart, the name of the new CR must be in the following format: `<name>-hazelcast-persistence`. For example, assuming that the older cluster was installed with `helm install hz hazelcast/hazelcast-enterprise`, the new CR name must be `hz-hazelcast-enterprise`.
2	If you configured maps with persistence, you must create them while creating the Hazelcast CR. To create them while creating the Hazelcast CR, you must use custom configuration. You cannot apply the Hazelcast CR and then the Map CRs.
3	`pvcNamePrefix` is the prefix of the existing PVCs. It can be set to `persistence` or `hot-restart-persistence` depending on the installation method of the old cluster. Run the `kubect get pvc` command to decide which setting best suits your requirements.
4	`baseDir` is the root directory for persistence.
5	`backupDir` is the directory that contains a backupFolder for each available backup.
6	`backupFolder` is the directory containing the backup for the restore.

To find the backupFolder value, you can run kubectl exec -it <hazelcast-custom-resource-name> -c hazelcast — /bin/bash to list the contents of your existing installation. If you already deleted your installation, you can run a simple pod that lists the contents of the PVC and checks its logs. This lists the folder structure of the specified PVC. For example:

apiVersion: v1
kind: Pod
metadata:
  name: list-content-pod
spec:
  containers:
    - name: list-content-container
      image: busybox
      command: ["/bin/sh", "-c", "tree /data/persistence"] (1)
      volumeMounts:
        - name: pv-storage
          mountPath: /data/persistence (2)
  volumes:
    - name: pv-storage
      persistentVolumeClaim:
        claimName: hot-restart-persistence-hazelcast-0 (3)

1	Replace `/data/persistence` with the path to your PV that is mounted inside the container.
2	Replace `/data/persistence` with the correct `mountPath` specified in the PV.
3	Replace with the name of the one of the PVCs that is mounted to the cluster from which the backup is taken.

Agent copies the backup to be restored from {baseDir}/{backupDir}/{backupFolder} to /data/persistence/base-dir.

Configure persistence

Data recovery timeout

To choose a data recovery timeout, you can use dataRecoveryTimeout. The field takes an integer value representing the timeout in seconds and uses this value to set validation-timeout-seconds, data-load-timeout-seconds Hazelcast persistence options.

apiVersion: hazelcast.com/v1alpha1
kind: Hazelcast
metadata:
  name: hazelcast
spec:
  clusterSize: 3
  repository: 'docker.io/hazelcast/hazelcast-enterprise'
  licenseKeySecretName: hazelcast-license-key
  persistence:
    clusterDataRecoveryPolicy: "FullRecoveryOnly"
    dataRecoveryTimeout: 600
    pvc:
      accessModes: ["ReadWriteOnce"]
      requestStorage: 20Gi

Choose a cluster recovery policy

To decide how a cluster should behave when one or more members cannot rejoin after a cluster-wide restart, you can define one of the following cluster recovery policies. Operator supports all the policies in the Hazelcast Platform cluster-data-recovery-policy configuration options. For complete descriptions and advice on choosing a policy, see Configure persistence.

FullRecoveryOnly

Does not allow partial start of the cluster.

PartialRecoveryMostRecent

Allows partial start with the members that have most recent partition table.

PartialRecoveryMostComplete

Allows partial start with the member that have most complete partition table.

Configure force or partial start

To recover a cluster that has persistence enabled after a cluster-wide restart, you can force a cluster to delete its persistence stores when one or more members fail to restart by triggering a force start. Alternatively, you can force a cluster to start without some members by triggering a partial start.

The cluster loses all persisted data after a force start if any member fails to start properly. If all members successfully start, all persisted data remains intact.
The cluster loses persisted data after a partial start only for specific members that have failed to start properly. Persisted data remains intact for members that successfully start.

To trigger the cluster recover action, set the startupAction:

apiVersion: hazelcast.com/v1alpha1
kind: Hazelcast
metadata:
  name: hazelcast
spec:
  clusterSize: 3
  repository: 'docker.io/hazelcast/hazelcast-enterprise'
  version: '5.7.0-slim'
  licenseKeySecretName: hazelcast-license-key
  persistence:
    clusterDataRecoveryPolicy: "FullRecoveryOnly"
    pvc:
      accessModes: ["ReadWriteOnce"]
      requestStorage: 8Gi
      storageClassName: "standard"
    startupAction: ForceStart

PartialStart can be triggered only when clusterDataRecoveryPolicy is set to PartialRecoveryMostRecent or PartialRecoveryMostComplete.

Persistence, backup and restore

Enable persistence

Trigger local backups

Trigger external backups

Schedule backups

Check the status of a backup

Restore from local backups

Restore from external backups

Restore from a persistent volume

Configure persistence

Data recovery timeout

Choose a cluster recovery policy

Configure force or partial start

Send us your feedback

Help and support