Persistence, Backup and Restore
Persistence allows individual members and whole clusters to recover data by persisting map entries, JCache data, and streaming job snapshots on disk. Members can use persisted data to recover from a planned shutdown (including rolling upgrades), a sudden cluster-wide crash, or a single member failure.
This topic assumes that you know about Persistence in Hazelcast. To learn about Persistence, see the Platform documentation.
The Operator supports two options for enabling Persistence: PVC and HostPath
. We recommend using PVC, so the examples on this page use this option. If you need to use HostPath
, see HostPath Support for Persistence.
Before you back up and restore Hazelcast data, you need to enable Persistence in the Hazelcast
resource.
There are two options for backup and restore; External
or Local
(Default option)
Enabling Local Persistence
-
Create the
Hazelcast
resource:apiVersion: hazelcast.com/v1alpha1 kind: Hazelcast metadata: name: hazelcast spec: clusterSize: 3 repository: 'docker.io/hazelcast/hazelcast-enterprise' version: '5.1.3-slim' licenseKeySecret: hazelcast-license-key persistence: baseDir: "/data/hot-restart/" (1) clusterDataRecoveryPolicy: "FullRecoveryOnly" (2) pvc: accessModes: ["ReadWriteOnce"] requestStorage: 20Gi (3)
1 Base directory of the backup data. 2 Cluster recovery policy. 3 Size of the PersistentVolumeClaim (PVC) where Hazelcast data is persisted. Make sure to calculate the total disk space that you will use. The total used disk space may be larger than the size of in-memory data, depending on how many backups you take. -
Apply the resource:
kubectl apply -f ./hazelcast-persistence.yaml
-
Check that Hazelcast members are ready:
kubectl get pods -l app.kubernetes.io/instance=hazelcast
NAME READY STATUS RESTARTS AGE hazelcast-0 1/1 Running 0 2m hazelcast-1 1/1 Running 0 1m hazelcast-2 1/1 Running 0 1m
-
Check that Hazelcast PVCs are created:
kubectl get pvc -l app.kubernetes.io/instance=hazelcast
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE hot-restart-persistence-hazelcast-0 Bound pvc-116b4084-a436-4462-b413-511b77df307b 20Gi RWO standard 2m hot-restart-persistence-hazelcast-1 Bound pvc-a7711b6b-dcbf-45cb-9577-8ce6b1892f2f 20Gi RWO standard 1m hot-restart-persistence-hazelcast-2 Bound pvc-64314d82-da7a-4e38-bd2d-e770a63dc4e8 20Gi RWO standard 1m
Triggering Local Backups
After Persistence is enabled for the cluster, you can trigger backups, using a HotBackup
resource.
-
Create the
HotBackup
resource:apiVersion: hazelcast.com/v1alpha1 kind: HotBackup metadata: name: hot-backup spec: hazelcastResourceName: hazelcast
-
Apply the resource:
kubectl apply -f ./hot-backup.yaml
Scheduling Backups
You can schedule regular backups, using the spec.schedule
field of a HotBackup
resource.
-
Create the
HotBackup
resource with aschedule
field:apiVersion: hazelcast.com/v1alpha1 kind: HotBackup metadata: name: hot-backup spec: hazelcastResourceName: hazelcast schedule: "* 0-23/6 * * *"
This field takes a valid cron expression. For example:
30 10 * * *
On 10:30 AM every day
0, 0, 1,15,25 * *
On 1st, 15th and 25th each month, midnight
@monthly
Once a month, first of the month, midnight
For the full list of the supported expression, please check the library documentation.
-
Apply the resource:
kubectl apply -f ./hot-backup-scheduled.yaml
Checking the Status of a Backup
To check the status of a backup, run the following command:
kubectl get hotbackup hot-backup
The status of the backup is displayed in the output.
NAME STATUS
hot-backup Success
Restoring from Local Backups
To restore a cluster from local backups, you can directly reapply the Hazelcast
resource, and it will have access to the PVCs that contain the persisted data.
-
Delete the Hazelcast cluster:
kubectl delete hazelcast hazelcast
-
Check that the PVCs are still available:
kubectl get pvc -l app.kubernetes.io/instance=hazelcast
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE hot-restart-persistence-hazelcast-0 Bound pvc-116b4084-a436-4462-b413-511b77df307b 20Gi RWO standard 6m hot-restart-persistence-hazelcast-1 Bound pvc-a7711b6b-dcbf-45cb-9577-8ce6b1892f2f 20Gi RWO standard 5m hot-restart-persistence-hazelcast-2 Bound pvc-64314d82-da7a-4e38-bd2d-e770a63dc4e8 20Gi RWO standard 5m
-
Reapply the
Hazelcast
resource:kubectl apply -f ./hazelcast-persistence.yaml
-
See that the same PVCs are bound to the new pods:
kubectl get pods -l app.kubernetes.io/instance=hazelcast -ojsonpath='{.items[*].spec.volumes[?(@.name=="hot-restart-persistence")].persistentVolumeClaim.claimName}'
You should see something like the following:
hot-restart-persistence-hazelcast-0 hot-restart-persistence-hazelcast-1 hot-restart-persistence-hazelcast-2
Enabling External Persistence
In some cases, keeping the data only at PVC and restoring the data from it is not enough. For example, moving data between two Kubernetes clusters. You can use external storages so they will become portable.
For enabling external persistence, you need two additional configurations. One is backupType
.
The other one is agent configuration
. When external persistence is used, the Backup Agent container will
be deployed with the Hazelcast container in the same Pod.
-
Create the
Hazelcast
resource:apiVersion: hazelcast.com/v1alpha1 kind: Hazelcast metadata: name: hazelcast spec: clusterSize: 3 repository: 'docker.io/hazelcast/hazelcast-enterprise' version: '5.1.3-slim' licenseKeySecret: hazelcast-license-key persistence: backupType: "External" (1) baseDir: "/data/hot-restart/" clusterDataRecoveryPolicy: "FullRecoveryOnly" pvc: accessModes: ["ReadWriteOnce"] requestStorage: 20Gi agent: (2) repository: hazelcast/platform-operator-agent
1 The backup type for where the backup data will be stored. It is either Local
orExternal
. Default value isLocal
.2 The agent which is responsible for backing data up into the external storage. The agent configuration is optional. If you set the backupType
toExternal
and do not pass the agent configuration, the operator directly use the latest stable version of the agent. -
Apply the resource:
kubectl apply -f ./hazelcast-persistence-agent.yaml
-
Check that Hazelcast members are ready:
kubectl get pods -l app.kubernetes.io/instance=hazelcast
NAME READY STATUS RESTARTS AGE hazelcast-0 2/2 Running 0 2m55s hazelcast-1 2/2 Running 0 2m15s hazelcast-2 2/2 Running 0 99s
-
Check that Hazelcast PVCs are created:
kubectl get pvc -l app.kubernetes.io/instance=hazelcast
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE hot-restart-persistence-hazelcast-0 Bound pvc-116b4084-a436-4462-b413-511b77df307b 20Gi RWO standard 2m hot-restart-persistence-hazelcast-1 Bound pvc-a7711b6b-dcbf-45cb-9577-8ce6b1892f2f 20Gi RWO standard 1m hot-restart-persistence-hazelcast-2 Bound pvc-64314d82-da7a-4e38-bd2d-e770a63dc4e8 20Gi RWO standard 1m
Triggering External Backups
To trigger an external backup, you need to configure a bucket URI and a secret to tell Hazelcast where to store backup data and how to authenticate.
-
Create the
HotBackup
resource:apiVersion: hazelcast.com/v1alpha1 kind: HotBackup metadata: name: hot-backup spec: hazelcastResourceName: hazelcast bucketURI: "s3://operator-backup" (1) secret: "br-secret-s3" (2)
1 The bucket URI where backup data will be stored in 2 Name of the secret with credentials for accessing the given Bucket URI. -
See AWS Session to learn about authentication procedure.
kubectl create secret generic <secret-name> --from-literal=region=<region> \ --from-literal=access-key-id=<access-key-id> \ --from-literal=secret-access-key=<secret-access-key>
See Application Default Credentials to learn about authentication procedure.
kubectl create secret generic <secret-name> --from-file=google-credentials-path=<service_account_json_file>
See Azure Storage Account Keys to learn about authentication procedure.
kubectl create secret generic <secret-name> \ --from-literal=storage-account=<storage-account> \ --from-literal=storage-key=<storage-key>
-
Apply the resource:
kubectl apply -f ./hot-backup-agent.yaml
Scheduling External Backups
You can schedule external backups, using the spec.schedule
field of a HotBackup
resource.
-
Create the
HotBackup
resource with aschedule
field:apiVersion: hazelcast.com/v1alpha1 kind: HotBackup metadata: name: hot-backup spec: hazelcastResourceName: hazelcast bucketURI: "s3://operator-backup" secret: "br-secret-s3" schedule: "* 0-23/6 * * *"
This field takes a valid cron expression. For example:
30 10 * * *
On 10:30 AM every day
0, 0, 1,15,25 * *
On 1st, 15th and 25th each month, midnight
@monthly
Once a month, first of the month, midnight
For the full list of the supported expression, please check the library documentation.
-
Apply the resource:
kubectl apply -f ./hot-backup-agent-scheduled.yaml
Checking the Status of a Backup
To check the status of a backup, run the following command:
kubectl get hotbackup hot-backup
The status of the backup is displayed in the output.
NAME STATUS
hot-backup Success
Restoring from External Backups
To restore a cluster from external backups, you must enable external restore by adding restore configuration
under the persistence.
Similar with External Persistence, you may pass the agent configuration. If not, default value of the agent configuration will be used.
When external restore mechanism is used, the Restore Agent container will
be deployed with the Hazelcast container in the same Pod. The agent will start as an initContainer before the Hazelcast container.
-
Delete the Hazelcast cluster:
kubectl delete hazelcast hazelcast
-
Recreate the
Hazelcast
resource:apiVersion: hazelcast.com/v1alpha1 kind: Hazelcast metadata: name: hazelcast spec: clusterSize: 3 repository: 'docker.io/hazelcast/hazelcast-enterprise' version: '5.1.3-slim' licenseKeySecret: hazelcast-license-key persistence: baseDir: "/data/hot-restart/" clusterDataRecoveryPolicy: "FullRecoveryOnly" pvc: accessModes: ["ReadWriteOnce"] requestStorage: 20Gi restore: bucketURI: "s3://operator-backup?prefix=hazelcast/2022-06-08-17-01-20/" (1) secret: br-secret-s3 (2) agent: (3) repository: hazelcast/platform-operator-agent
1 The bucket URI where backup data will be restored from. 2 Name of the secret with credentials for accessing the given Bucket URI. 3 The agent which is responsible for restoring data from the external storage. The agent configuration is optional. If you give restore
under thepersistence
and do not pass the agent configuration, the operator directly use the latest stable version of the agent. -
If you have not created the secret, you must create the secret using the same way with the Triggering External Backups.
-
Apply the
Hazelcast
resource:kubectl apply -f ./hazelcast-persistence-restore.yaml
-
See that the same PVCs are bound to the new pods:
kubectl get pods -l app.kubernetes.io/instance=hazelcast -ojsonpath='{.items[*].spec.volumes[?(@.name=="hot-restart-persistence")].persistentVolumeClaim.claimName}'
You should see something like the following:
hot-restart-persistence-hazelcast-0 hot-restart-persistence-hazelcast-1 hot-restart-persistence-hazelcast-2
Data Recovery Timeout
To choose a data recovery timeout, you can use dataRecoveryTimeout. The field takes an integer value representing the timeout in seconds and uses this value to set validation-timeout-seconds, data-load-timeout-seconds Hazelcast Persistence options.
apiVersion: hazelcast.com/v1alpha1
kind: Hazelcast
metadata:
name: hazelcast
spec:
clusterSize: 3
repository: 'docker.io/hazelcast/hazelcast-enterprise'
version: '5.1.3-slim'
licenseKeySecret: hazelcast-license-key
persistence:
baseDir: "/data/hot-restart/"
clusterDataRecoveryPolicy: "FullRecoveryOnly"
dataRecoveryTimeout: 600
pvc:
accessModes: ["ReadWriteOnce"]
requestStorage: 20Gi
Choosing a Cluster Recovery Policy
To decide how a cluster should behave when one or more members cannot rejoin after a cluster-wide restart, you can define one of the following cluster recovery policies. The Operator supports all the policies in the Hazelcast Platform cluster-data-recovery-policy configuration
options. For complete descriptions and advice on choosing a policy, see the Platform documentation.
|
Does not allow partial start of the cluster. |
|
Allows partial start with the members that have most recent partition table. |
|
Allows partial start with the member that have most complete partition table. |
Configuring Force-Start
If you use the FullRecoveryOnly
policy, you can configure the Operator to detect failed Hazelcast members and automatically trigger a force-start. The Operator will trigger a force-start only if the cluster is in a PASSIVE
state.
The cluster loses all persisted data after a force-start. |
Enable autoForceStart
:
apiVersion: hazelcast.com/v1alpha1
kind: Hazelcast
metadata:
name: hazelcast
spec:
clusterSize: 3
repository: 'docker.io/hazelcast/hazelcast-enterprise'
version: '5.1.3-slim'
licenseKeySecret: hazelcast-license-key
persistence:
baseDir: "/data/hot-restart/"
autoForceStart: true
HostPath Support for Persistence
You can also use HostPath to enable persistence.
HostPath support is discouraged for the production environments for the reasons mentioned in the Kubernetes documentation |
HostPath support expects the size of the cluster to be equal to the number of Kubernetes nodes
and pods are distributed to the nodes equally. You can manage how pods are distributed among nodes by setting the topologySpreadContraints field, which is described in Scheduling Hazelcast Pods.
|
-
Create the Hazelcast resource with the
clusterSize
equal to the number of Kubernetes nodes and give the propertopologySpreadConstraints
:apiVersion: hazelcast.com/v1alpha1 kind: Hazelcast metadata: name: hazelcast spec: clusterSize: 3 repository: 'docker.io/hazelcast/hazelcast-enterprise' version: '5.1.3-slim' licenseKeySecret: hazelcast-license-key persistence: baseDir: "/data/hot-restart/" clusterDataRecoveryPolicy: "FullRecoveryOnly" hostPath: "/tmp/hazelcast" scheduling: topologySpreadConstraints: - maxSkew: 1 topologyKey: kubernetes.io/hostname whenUnsatisfiable: DoNotSchedule labelSelector: matchLabels: app.kubernetes.io/instance: hazelcast
-
Apply the Hazelcast resource
kubectl apply -f ./hazelcast-persistence-hostpath.yaml