Persistence, Backup and Restore
Persistence allows individual members and whole clusters to recover data by persisting map entries, JCache data, and streaming job snapshots on disk. Members can use persisted data to recover from a planned shutdown (including rolling upgrades), a sudden cluster-wide crash, or a single member failure.
This topic assumes that you know about Persistence in Hazelcast. To learn about Persistence, see the Platform documentation.
Backups can be either of the following:
-
Local: Local backups are kept in volume and never moved anywhere.
-
External: External backups are moved into buckets provided by the user.
For a working example, see this tutorial. |
Enabling Persistence
Enabling Hazelcast persistence is done with the following configuration.
apiVersion: hazelcast.com/v1alpha1
kind: Hazelcast
metadata:
name: hazelcast
spec:
clusterSize: 3
repository: 'docker.io/hazelcast/hazelcast-enterprise'
licenseKeySecretName: hazelcast-license-key
persistence:
clusterDataRecoveryPolicy: "FullRecoveryOnly" (1)
pvc:
accessModes: ["ReadWriteOnce"]
requestStorage: 20Gi (2)
agent: (3)
repository: hazelcast/platform-operator-agent
1 | Cluster recovery policy. |
2 | Size of the PersistentVolumeClaim (PVC) where Hazelcast data is persisted. |
3 | Agent responsible for moving data from the local storage to external buckets.
The agent configuration is optional. If you enable persistence and do not pass the agent configuration, Hazelcast Platform Operator
uses the latest agent version that is compatible with its version. |
Make sure to calculate the total disk space that you will use. The total used disk space may be larger than the size of in-memory data, depending on how many backups you take. |
Triggering Local Backups
You can take local backups using the HotBackup
custom resource. Local backups are kept in volume and are not moved anywhere.
apiVersion: hazelcast.com/v1alpha1
kind: HotBackup
metadata:
name: hot-backup
spec:
hazelcastResourceName: hazelcast
Triggering External Backups
In some cases, keeping the data only at PVC and restoring the data from it is not enough. For example, moving data between two Kubernetes clusters. You can use external storages so they will become portable.
When persistence is enabled Hazelcast pod will start with a sidecar agent which will upload the backups into an external bucket provided by the user. For external storage, AWS S3, GCP Bucket and Azure Blob storage options are supported.
To trigger an external backup, you need to configure a bucket URI and a secret to tell Hazelcast where to store backup data and how to authenticate.
External backup in AWS S3:
apiVersion: hazelcast.com/v1alpha1
kind: HotBackup
metadata:
name: hot-backup
spec:
hazelcastResourceName: hazelcast
bucketURI: "s3://bucket-name/path/to/folder" (1)
secretName: "secret-aws-s3" (2)
See AWS Session to learn about authentication procedure.
kubectl create secret generic <secret-name> \
--from-literal=region=<region> \
--from-literal=access-key-id=<access-key-id> \
--from-literal=secret-access-key=<secret-access-key>
External backup in GCP Bucket:
apiVersion: hazelcast.com/v1alpha1
kind: HotBackup
metadata:
name: hot-backup
spec:
hazelcastResourceName: hazelcast
bucketURI: "gs://bucket-name/path/to/folder" (1)
secretName: "secret-gcp-bucket" (2)
See Application Default Credentials to learn about authentication procedure.
kubectl create secret generic <secret-name> \
--from-file=google-credentials-path=<service_account_json_file>
External backup in Azure Blob:
apiVersion: hazelcast.com/v1alpha1
kind: HotBackup
metadata:
name: hot-backup
spec:
hazelcastResourceName: hazelcast
bucketURI: "azblob://bucket-name/path/to/folder" (1)
secretName: "secret-azure-blob" (2)
See Azure Storage Account Keys to learn about authentication procedure.
kubectl create secret generic <secret-name> \
--from-literal=storage-account=<storage-account> \
--from-literal=storage-key=<storage-key>
1 | The bucket URI where backup data will be stored in |
2 | Name of the secret with credentials for accessing the given Bucket URI. |
Scheduling Backups
You can schedule backups using the schedule
and hotBackupTemplate
fields of the CronHotBackup
resource. For more information about the CronHotBackup
resource, see the API Reference.
apiVersion: hazelcast.com/v1alpha1
kind: CronHotBackup
metadata:
name: cron-hot-backup
spec:
schedule: "* 0-23/6 * * *"
hotBackupTemplate:
spec:
hazelcastResourceName: hazelcast
The schedule
field takes a valid cron expression. For example, you can configure the following scheduled backups:
|
At 10:30 AM every day |
|
On 1st, 15th, and 25th of each month at midnight |
|
On the first day of each month at midnight |
For a full list of supported expressions, see the library documentation.
Checking the Status of a Backup
To check the status of a local backup, run the following command:
kubectl get hotbackup hot-backup
The status of the backup is displayed in the output.
NAME STATUS
hot-backup Success
Restoring from Local Backups
To restore a cluster from local backups, you can directly reapply the Hazelcast
resource, which gives the cluster access to the PVCs that contain the persisted data. This will restore the Hazelcast cluster from existing hot-restart
folders.
Or, to restore from local backups that you have taken using the HotBackup
resource, give the HotBackup
resource name in the restore configuration. For the restore to work correctly, make sure the status of the HotBackup
resource is Success
.
When this restore mechanism is used, the Restore Agent container is deployed with the Hazelcast container in the same Pod. The agent starts as an initContainer before the Hazelcast container.
apiVersion: hazelcast.com/v1alpha1
kind: Hazelcast
metadata:
name: hazelcast (1)
spec:
clusterSize: 3
repository: 'docker.io/hazelcast/hazelcast-enterprise'
licenseKeySecretName: hazelcast-license-key
persistence:
clusterDataRecoveryPolicy: "FullRecoveryOnly"
pvc:
accessModes: ["ReadWriteOnce"]
requestStorage: 20Gi
restore:
hotBackupResourceName: hot-backup (2)
agent: (3)
repository: hazelcast/platform-operator-agent
1 | Hazelcast custom resource name for both backup and restore CRs must be the same, otherwise, the restore fails. |
2 | HotBackup resource name used for the restore. The backup folder name will be the name you provide here. |
3 | Agent responsible for restoring data from the local storage.
The agent configuration is optional. If you give restore under persistence and do not pass the agent configuration, Hazelcast Platform Operator
uses the latest agent version that is compatible with its version. |
You can use a local backup only once to restore a cluster. We recommend you backup externally if you need to persistently restore a backup across the clusters. |
Restoring from External Backups
To restore a cluster from external backups, you can either set up the bucket configuration or give the HotBackup
resource name that you used to trigger the external backup. In either case, the backup is restored from the external bucket.
When this restore mechanism is used, the Restore Agent container is deployed with the Hazelcast container in the same Pod. The agent starts as an initContainer before the Hazelcast container.
If you have not created the secret, you must do so in the same way as in Triggering External Backups. |
apiVersion: hazelcast.com/v1alpha1
kind: Hazelcast
metadata:
name: hazelcast
spec:
clusterSize: 3
repository: 'docker.io/hazelcast/hazelcast-enterprise'
licenseKeySecretName: hazelcast-license-key
persistence:
clusterDataRecoveryPolicy: "FullRecoveryOnly"
pvc:
accessModes: ["ReadWriteOnce"]
requestStorage: 20Gi
restore:
bucketConfig:
bucketURI: "s3://operator-backup?prefix=hazelcast/2022-06-08-17-01-20/" (1)
secretName: br-secret-s3 (2)
agent: (3)
repository: hazelcast/platform-operator-agent
1 | Bucket URI where backup data will be restored from. |
2 | Name of the secret with credentials for accessing the given bucket URI. |
3 | Agent which is responsible for restoring data from the external storage.
The agent configuration is optional. If you provide restore under persistence and do not pass the agent configuration, Hazelcast Platform Operator
uses the latest agent version that is compatible with its version. |
apiVersion: hazelcast.com/v1alpha1
kind: Hazelcast
metadata:
name: hazelcast (1)
spec:
clusterSize: 3
repository: 'docker.io/hazelcast/hazelcast-enterprise'
licenseKeySecretName: hazelcast-license-key
persistence:
clusterDataRecoveryPolicy: "FullRecoveryOnly"
pvc:
accessModes: ["ReadWriteOnce"]
requestStorage: 20Gi
restore:
hotBackupResourceName: hot-backup (2)
agent: (3)
repository: hazelcast/platform-operator-agent
1 | HotBackup resource name used for the restore. The bucket URI and secret are taken from the HotBackup resource. |
2 | Agent responsible for restoring data from external storage.
The agent configuration is optional. If you provide restore under persistence and do not pass the agent configuration, Hazelcast Platform Operator
uses the latest agent version that is compatible with its version. |
Restoring from Persistent Volumes
To restore from local backups that are in an existing Persistent Volume, configure localConfig
.
apiVersion: hazelcast.com/v1alpha1
kind: Hazelcast
metadata:
name: hazelcast (1)
spec:
clusterSize: 3
repository: 'docker.io/hazelcast/hazelcast-enterprise'
licenseKeySecretName: hazelcast-license-key
customConfigCmName: mapCustomConfig (2)
persistence:
pvc:
accessModes: ["ReadWriteOnce"]
requestStorage: 20Gi
restore:
localConfig:
pvcNamePrefix: "hot-restart-persistence" (3)
baseDir: "base" (4)
backupDir: "backup" (5)
backupFolder: "backup-1709045128548" (6)
1 | To successfully attach existing PVCs to the newly created cluster:
|
2 | If you configured maps with persistence, you must create them while creating the Hazelcast CR. To create them while creating the Hazelcast CR, you must use Custom Config. You cannot apply the Hazelcast CR and then the Map CRs. |
3 | pvcNamePrefix is the prefix of the existing PVCs. It can be set to persistence or hot-restart-persistence depending on the installation method of the old cluster. In different versions of Hazelcast Helm Chart and Hazelcast Platform Operator, these values are used by default. Run the kubect get pvc command to decide which setting best suits your requirements. |
4 | baseDir is the root directory for persistence. |
5 | backupDir is the directory that contains a backupFolder for each available backup. |
6 | backupFolder is the directory containing the specific backup for the restore. |
To find the backupFolder value, you can run kubectl exec -it <hazelcast-custom-resource-name> -c hazelcast — /bin/bash and list the contents of your in your existing installation. If you already deleted your installation, you can run a simple pod that lists the contents of the PVC and checks its logs. This lists the folder structure of the specified PVC. For example:
|
apiVersion: v1
kind: Pod
metadata:
name: list-content-pod
spec:
containers:
- name: list-content-container
image: busybox
command: ["/bin/sh", "-c", "tree /data/persistence"] (1)
volumeMounts:
- name: pv-storage
mountPath: /data/persistence (2)
volumes:
- name: pv-storage
persistentVolumeClaim:
claimName: hot-restart-persistence-hazelcast-0 (3)
1 | Replace /data/persistence with the path to your PV, which is mounted inside the container. |
2 | Replace /data/persistence with the correct mountPath specified in PV. |
3 | Replace with the name of the one of the PVCs, which is mounted to the cluster from which the backup is taken. |
Agent copies the backup to be restored from {baseDir}/{backupDir}/{backupFolder} to /data/persistence/base-dir .
|
Configuring Persistence
Data Recovery Timeout
To choose a data recovery timeout, you can use dataRecoveryTimeout. The field takes an integer value representing the timeout in seconds and uses this value to set validation-timeout-seconds, data-load-timeout-seconds Hazelcast Persistence options.
apiVersion: hazelcast.com/v1alpha1
kind: Hazelcast
metadata:
name: hazelcast
spec:
clusterSize: 3
repository: 'docker.io/hazelcast/hazelcast-enterprise'
licenseKeySecretName: hazelcast-license-key
persistence:
clusterDataRecoveryPolicy: "FullRecoveryOnly"
dataRecoveryTimeout: 600
pvc:
accessModes: ["ReadWriteOnce"]
requestStorage: 20Gi
Choosing a Cluster Recovery Policy
To decide how a cluster should behave when one or more members cannot rejoin after a cluster-wide restart, you can define one of the following cluster recovery policies. The Operator supports all the policies in the Hazelcast Platform cluster-data-recovery-policy configuration
options. For complete descriptions and advice on choosing a policy, see the Platform documentation.
|
Does not allow partial start of the cluster. |
|
Allows partial start with the members that have most recent partition table. |
|
Allows partial start with the member that have most complete partition table. |
Configuring Force/Partial Start
To recover a cluster that has Persistence enabled after a cluster-wide restart, you can force a cluster to delete their persistence stores when one or more members fail to restart by triggering Force Start. Or you can force a cluster to start without some members by triggering Partial Start
|
To trigger the cluster recover action set the startupAction
:
apiVersion: hazelcast.com/v1alpha1
kind: Hazelcast
metadata:
name: hazelcast
spec:
clusterSize: 3
repository: 'docker.io/hazelcast/hazelcast-enterprise'
version: '5.4.0-slim'
licenseKeySecretName: hazelcast-license-key
persistence:
clusterDataRecoveryPolicy: "FullRecoveryOnly"
pvc:
accessModes: ["ReadWriteOnce"]
requestStorage: 8Gi
storageClassName: "standard"
startupAction: ForceStart
The PartialStart
can be triggered only when clusterDataRecoveryPolicy
is set to PartialRecoveryMostRecent
or PartialRecoveryMostComplete
.