Backing Up the Persistence Store
You can trigger backups of your persistence store, using the Java API, REST API, or Management Center. Backing up the persistence store is useful if you want to copy the data onto other clusters without having to shut down your cluster.
To back up data in the persistence store, the cluster must be configured with a directory in the
persistence.backup-dir configuration. See Configuring Persistence.
When a member receives a backup request, it becomes the coordinating member and sends a new backup sequence ID to all members.
If all members respond that no other backup is currently in progress and that no other backup request has already been made, then the coordinating member commands the cluster to start the backup process nearly instantaneously on all members.
During this process, each member creates a sequenced backup subdirectory in
backup-dir directory with the name
To make the backup process more performant, the contents of files in the persistence store are not duplicated. Instead, members create a new file name for the same persisted contents on disk, using hard links. If the hard link fails for any reason, members continue by copying the data, but future backups will still try to use hard links.
|Backups are transactional and cluster-wide, so either all or none of the members start the same backup sequence.|
To trigger a new backup, you can use one of the following options:
Backups may be initiated during membership changes, partition table changes, or during normal data updates. As a result, some members can have outdated versions of data before they start the backup process and copy the stale persisted data. By putting your cluster in a
PASSIVEstate, you can make data more consistent on all members.
Trigger a backup.
PersistenceService service = member.getCluster().getPersistenceService(); service.backup();
The sequence number in sequenced backup subdirectories is generated by the backup process, but you can define your own sequence numbers as shown below:
PersistenceService service = member.getCluster().getPersistenceService(); long backupSeq = ... service.backup(backupSeq);
Backups fail if any member contains a sequenced backup subdirectory with the same name.
Once the backup method has returned, all cluster metadata is copied and the exact partition data which needs to be copied is marked. After that, the backup process continues asynchronously and you can return the cluster to the
ACTIVEstate and resume operations.
Only cluster and distributed object metadata is copied synchronously during the invocation of the backup method. The rest of the persistence store is copied asynchronously after the method call has ended. You can track the progress of the backup process, using one of the following options:
An example of how to track the progress via the Java API is shown below:
PersistenceService service = member.getCluster().getPersistenceService(); BackupTaskStatus status = service.getBackupTaskStatus(); ...
The returned object contains the local member’s backup status:
The backup state (NOT_STARTED, IN_PROGRESS, FAILURE, SUCCESS)
The completed count
The total count
The completed and total count can provide you a way to track the
percentage of the copied data. Currently the count defines the
number of copied and total local member persistence stores
but this can change at a later point to provide greater resolution.
Besides tracking the Persistence status by API, you can view the status in the
Management Center and you can inspect the on-disk files for each member.
Each member creates an
inprogress file which is created in each of the copied persistence stores.
This means that the backup is currently in progress. When the backup task completes
the backup operation, this file is removed. If an error occurs during the backup task,
inprogress file is renamed to
failure which contains a stack trace of the exception.
Once the backup method call has returned and asynchronous copying of the partition data has started, the backup task can be interrupted. This is helpful in situations where the backup task has started at an inconvenient time. For instance, the backup task could be automated and it could be accidentally triggered during high load on the Hazelcast instances, causing the performance of the Hazelcast instances to drop.
The backup task mainly uses disk I/O, consumes little CPU and it generally does not last for a long time (although you should test it with your environment to determine the exact impact). Nevertheless, you can abort the backup tasks on all members via a cluster-wide interrupt operation. This operation can be triggered programmatically or from the Management Center.
An example of programmatic interruption is shown below:
PersistenceService service = member.getCluster().getPersistenceService(); service.interruptBackupTask(); ...
This method sends an interrupt to all members. The interrupt is ignored if the backup task is currently not in progress so you can safely call this method even though it has previously been called or when some members have already completed their local backup tasks.
You can also interrupt the local member backup task as shown below:
PersistenceService service = member.getCluster().getPersistenceService(); service.interruptLocalBackupTask(); ...
The backup task stops as soon as possible and it does not remove the disk contents of the backup directory meaning that you must remove it manually.
To restore a cluster with data from a specific backup, do the following:
To start a new cluster from the backups of an existing cluster, do the following for each existing member before starting the cluster: Copy the contents of an existing member’s backup subdirectory to the directory that’s configured in a new member’s
|The cluster on which you restore the backup must have the same number of members as the cluster that created the backups.|