Backing Up Persisted Data
You can take snapshots of your persistence store (persisted files) to be able to copy the data onto other clusters without having to shut down your cluster. This process is called a hot backup.
When a member receives a backup request, it becomes the coordinating member and sends a new backup sequence ID to all members.
If all members respond that no other backup is currently in progress and that no other backup request has already been made, then the coordinating member commands the cluster to start the backup process nearly instantaneously on all members.
During this process, each member creates a sequenced backup subdirectory in
backup-dir directory with the name
To make the backup process more performant, the contents of files in the persistence store are not duplicated. Instead, members create a new file name for the same persisted contents on disk, using hard links. If the hard link fails for any reason, members continue by copying the data, but future backups will still try to use hard links.
|Backups are transactional and cluster-wide, so either all or none of the members start the same backup sequence.|
To back up persisted data, you must first configure the backup directory in the
<hazelcast> ... <persistence enabled="true"> <backup-dir>/mnt/hot-backup</backup-dir> ... </persistence> ... </hazelcast>
hazelcast: persistence: enabled: true backup-dir: /mnt/hot-backup
PersistenceConfig PersistenceConfig = new PersistenceConfig(); PersistenceConfig.setBackupDir(new File("/mnt/hot-backup")); ... config.setPersistenceConfig(PersistenceConfig);
To trigger a new backup, you can use one of the following options:
Backups may be initiated during membership changes, partition table changes, or during normal data updates. As a result, some members can have outdated versions of data before they start the backup process and copy the stale persisted data. By putting your cluster in a
PASSIVEstate, you can make data more consistent on all members.
Trigger a backup.
PersistenceService service = member.getCluster().getPersistenceService(); service.backup();
The sequence number in sequenced backup subdirectorys is generated by the hot backup process, but you can define your own sequence numbers as shown below:
PersistenceService service = member.getCluster().getPersistenceService(); long backupSeq = ... service.backup(backupSeq);
Backups fail if any member contains a sequenced backup subdirectory with the same name.
Once the backup method has returned, all cluster metadata is copied and the exact partition data which needs to be copied is marked. After that, the backup process continues asynchronously and you can return the cluster to the
ACTIVEstate and resume operations.
Only cluster and distributed object metadata is copied synchronously during the invocation of the backup method. The rest of the persistence store is copied asynchronously after the method call has ended. You can track the progress of the backup process, using one of the following options:
An example of how to track the progress via the Java API is shown below:
PersistenceService service = member.getCluster().getPersistenceService(); BackupTaskStatus status = service.getBackupTaskStatus(); ...
The returned object contains the local member’s backup status:
The backup state (NOT_STARTED, IN_PROGRESS, FAILURE, SUCCESS)
The completed count
The total count
The completed and total count can provide you a way to track the
percentage of the copied data. Currently the count defines the
number of copied and total local member persistence stores
but this can change at a later point to provide greater resolution.
Besides tracking the Persistence status by API, you can view the status in the
Management Center and you can inspect the on-disk files for each member.
Each member creates an
inprogress file which is created in each of the copied persistence stores.
This means that the backup is currently in progress. When the backup task completes
the backup operation, this file is removed. If an error occurs during the backup task,
inprogress file is renamed to
failure which contains a stack trace of the exception.
Once the backup method call has returned and asynchronous copying of the partition data has started, the backup task can be interrupted. This is helpful in situations where the backup task has started at an inconvenient time. For instance, the backup task could be automated and it could be accidentally triggered during high load on the Hazelcast instances, causing the performance of the Hazelcast instances to drop.
The backup task mainly uses disk I/O, consumes little CPU and it generally does not last for a long time (although you should test it with your environment to determine the exact impact). Nevertheless, you can abort the backup tasks on all members via a cluster-wide interrupt operation. This operation can be triggered programmatically or from the Management Center.
An example of programmatic interruption is shown below:
PersistenceService service = member.getCluster().getPersistenceService(); service.interruptBackupTask(); ...
This method sends an interrupt to all members. The interrupt is ignored if the backup task is currently not in progress so you can safely call this method even though it has previously been called or when some members have already completed their local backup tasks.
You can also interrupt the local member backup task as shown below:
PersistenceService service = member.getCluster().getPersistenceService(); service.interruptLocalBackupTask(); ...
The backup task stops as soon as possible and it does not remove the disk contents of the backup directory meaning that you must remove it manually.
The backup process creates sequenced subdirectories
backup-<backupSeq> in the configured hot backup directory
backup-dir). To start a cluster with data from a specific backup, you need to set
the base directory (
base-dir) to the desired backup subdirectory.
For example, if you configure your cluster members with the following, you would copy each existing member’s backup subdirectory to the directory that’s configured in the new member’s
So, assuming the new members also had the same configured
backup-dir, you would copy
/opt/hz/backups/backup-<backupSeq>/* from the existing member to
/opt/hz/data on the new member.