Shutting Down Members and Clusters

Maintaining a cluster often requires shutting down clusters or individual members to make changes take effect. Hazelcast offers convenient methods for shutting down clusters as well as graceful and ungraceful options for specific members.

Shutting Down a Hazelcast Cluster

To send a command to shut down the cluster, use one of the following options:

CLI
REST API
Java
Management Center

To use the CLI to shut down a cluster, you must first enable the REST API.

bin/hz-cluster-admin -a <address> -c <cluster-name> -o shutdown

The REST API has been deprecated and will be removed as of Hazelcast version 7.0. An improved version of this feature is under development.

To use the REST API to shut down a cluster, you must first enable it.

curl --data "${CLUSTERNAME}&${PASSWORD}"  http://127.0.0.1:${PORT}/hazelcast/rest/management/cluster/clusterShutdown

HazelcastInstance.getCluster().shutdown()

Use Hazelcast Management Center to shut down your whole Hazelcast cluster.

When you send the command to shut down a cluster, the members that are not in a PASSIVE state temporarily change their states to PASSIVE. Then, each member shuts itself down.

When the cluster is restarted, the state of each member depends on whether persistence is enabled:

If persistence is enabled, the state of each member is restored to whatever it was before the shutdown.
If persistence is disabled, each member is put in an ACTIVE state.

For information about states and what they mean, see Cluster States.

Shutting Down a Hazelcast Member

To shut down a specific member, use one of the following options:

Graceful shutdown: To prevent data loss.
Ungraceful shutdown: To shut down faster while risking data loss.

Graceful Shutdown

When you gracefully shut down a member, it allows a Hazelcast cluster to migrate the member’s partitions to the rest of the cluster, preventing data loss.

During a graceful shutdown the oldest cluster member (master member) migrates all the replicas owned by the shutdown-requesting member to the other running cluster members.

While the master member is migrating replicas, the rest of the cluster waits for a configured period of time until this process is finished. You can specify this period using the hazelcast.graceful.shutdown.max.wait property. If migrations are not completed within this period, the shutdown process may continue ungracefully, which may lead to data loss.

When choosing a value for this period, consider the size of data in your cluster and how long it may take to migrate it.

After these migrations are completed, the shutdown-requesting member will no longer be the owner of any partitions or backup partitions.

To gracefully shut down a Hazelcast member (so that it waits the migration operations to be completed), you have the following options:

Call the HazelcastInstance.shutdown() method.
Use the JMX API’s shutdown method. You can do this by implementing a JMX client application or using a JMX monitoring tool such as JConsole.
Use Management Center.

Ungraceful Shutdown

If you need to shut down a member quickly and you do not care about data loss, you can use one of the following options to force shutdown without waiting for migrations:

Shutdown option Methods Notes

Shutdown option	Methods	Notes
Send a SIGKILL signal	Use the `kill -9 <PID>` command in the terminal.	This option does not release the member’s used resources and is not recommended for production systems.
Send a SIGTERM signal	Use the `kill -15 <PID>` command in the terminal. Call the `HazelcastInstance.getLifecycleService().terminate()` method. Execute the `stop.sh` script in your member’s `bin` directory.	This option releases most of the member’s used resources. If you set the `hazelcast.shutdownhook.enabled` property to `false`, your member will ignore the `SIGTERM` signal and the process will terminate ungracefully like it had received a `SIGKILL` signal. If you set the `hazelcast.shutdownhook.policy` property to `GRACEFUL` your member will be gracefully shutdown.

Send a SIGKILL signal

Use the kill -9 <PID> command in the terminal.

This option does not release the member’s used resources and is not recommended for production systems.

Send a SIGTERM signal

Use the kill -15 <PID> command in the terminal.

Call the HazelcastInstance.getLifecycleService().terminate() method.

Execute the stop.sh script in your member’s bin directory.

This option releases most of the member’s used resources.

If you set the hazelcast.shutdownhook.enabled property to false, your member will ignore the SIGTERM signal and the process will terminate ungracefully like it had received a SIGKILL signal.

If you set the hazelcast.shutdownhook.policy property to GRACEFUL your member will be gracefully shutdown.

If you use the systemd systemctl utility such as systemctl stop service_name, a SIGTERM signal is sent. After 90 seconds of waiting, this signal is followed by a SIGKILL signal by default. We do not recommend using this utility with its default settings.

Ensuring Members and Clusters are in a Safe State

Before you shut down members or clusters, you can check that their backup partitions are synchronized with the primary ones.

To check the state of partitions, use the following methods in the PartitionService interface:

public interface PartitionService {
   ...
   ...
    boolean isClusterSafe();
    boolean isMemberSafe(Member member);
    boolean isLocalMemberSafe();
    boolean forceLocalMemberToBeSafe(long timeout, TimeUnit unit);
}

The isClusterSafe() method checks whether the cluster is in a safe state. It returns true if there are no active partition migrations and all backups are in sync for each partition.

The isMemberSafe() method checks whether a specific member is in a safe state. It checks if all backups of partitions of the given member are in sync with the primary ones. Once it returns true, the given member is safe and it can be shut down without data loss.

Similarly, the isLocalMemberSafe() method does the same check for the local member. The forceLocalMemberToBeSafe() method forces the owned and backup partitions to be synchronized, making the local member safe.

For code samples, see GitHub.