Shutting Down Members and Clusters
Maintaining a cluster often requires shutting down clusters or individual members to make changes take effect. Hazelcast offers convenient methods for shutting down clusters as well as graceful and ungraceful options for specific members.
Shutting Down a Hazelcast Cluster
To send a command to shut down the cluster, use one of the following options:
To use the CLI to shut down a cluster, you must first enable the REST API. |
bin/hz-cluster-admin -a <address> -c <cluster-name> -o shutdown
To use the REST API to shut down a cluster, you must first enable it. |
curl --data "${CLUSTERNAME}&${PASSWORD}" http://127.0.0.1:${PORT}/hazelcast/rest/management/cluster/clusterShutdown
HazelcastInstance.getCluster().shutdown()
Use Hazelcast Management Center to shut down your whole Hazelcast cluster.
When you send the command to shut down a cluster, the members that are not in a PASSIVE
state temporarily change their states to PASSIVE
. Then, each member shuts itself down.
When the cluster is restarted, the state of each member depends on whether persistence is enabled:
-
If persistence is enabled, the state of each member is restored to whatever it was before the shutdown.
-
If persistence is disabled, each member is put in an
ACTIVE
state.
For information about states and what they mean, see Cluster States.
Shutting Down a Hazelcast Member
To shut down a specific member, use one of the following options:
-
Graceful shutdown: To prevent data loss.
-
Ungraceful shutdown: To shut down faster while risking data loss.
Graceful Shutdown
When you gracefully shut down a member, it allows a Hazelcast cluster to migrate the member’s partitions to the rest of the cluster, preventing data loss.
During a graceful shutdown the oldest cluster member (master member) migrates all the replicas owned by the shutdown-requesting member to the other running cluster members.
While the master member is migrating replicas, the rest of the cluster waits for a configured period of time until this process is finished. You can specify this period using the hazelcast.graceful.shutdown.max.wait
property. If migrations are not completed within this period, the shutdown process may continue ungracefully, which may lead to data loss.
When choosing a value for this period, consider the size of data in your cluster and how long it may take to migrate it. |
After these migrations are completed, the shutdown-requesting member will no longer be the owner of any partitions or backup partitions.
To gracefully shut down a Hazelcast member (so that it waits the migration operations to be completed), you have the following options:
-
Call the
HazelcastInstance.shutdown()
method. -
Use the JMX API’s shutdown method. You can do this by implementing a JMX client application or using a JMX monitoring tool such as JConsole.
-
Use Management Center.
Ungraceful Shutdown
If you need to shut down a member quickly and you do not care about data loss, you can use one of the following options to force shutdown without waiting for migrations:
Shutdown option | Methods | Notes |
---|---|---|
Send a SIGKILL signal |
Use the |
This option does not release the member’s used resources and is not recommended for production systems. |
Send a SIGTERM signal |
Use the Call
the Execute
the |
This option releases most of the member’s used resources. If you set the If you set the |
If you use the systemd systemctl
utility such as systemctl stop service_name
, a SIGTERM signal is sent.
After 90 seconds of waiting, this signal is followed by a SIGKILL signal by default.
We do not recommend using this utility with its default settings.
Ensuring Members and Clusters are in a Safe State
Before you shut down members or clusters, you can check that their backup partitions are synchronized with the primary ones.
To check the state of partitions, use the following methods in the PartitionService
interface:
public interface PartitionService {
...
...
boolean isClusterSafe();
boolean isMemberSafe(Member member);
boolean isLocalMemberSafe();
boolean forceLocalMemberToBeSafe(long timeout, TimeUnit unit);
}
The isClusterSafe()
method checks whether the cluster is in a safe state.
It returns true
if there are no active partition migrations and all backups are in sync for each partition.
The isMemberSafe()
method checks whether a specific member is in a safe state.
It checks if all backups of partitions of the given member are in sync with the primary ones.
Once it returns true
, the given member is safe and it can be shut down without data loss.
Similarly, the isLocalMemberSafe()
method does the same check for the local member.
The forceLocalMemberToBeSafe()
method forces the owned and backup partitions to be synchronized,
making the local member safe.
For code samples, see GitHub.