This is a prerelease version.

Shutting Down the Cluster

Safety Checking Cluster Members

To prevent data loss when shutting down a cluster member, Hazelcast provides a graceful shutdown feature. You perform this shutdown by calling the method HazelcastInstance.shutdown().

The oldest cluster member migrates all the replicas owned by the shutdown-requesting member to the other running (not initiated shutdown) cluster members. After these migrations are completed, the shutting down member will not be the owner or a backup of any partition anymore. It means that you can shutdown any number of Hazelcast members in a cluster concurrently with no data loss.

Please note that the process of shutting down members waits for a predefined amount of time for the oldest member to migrate their partition replicas. You can specify this graceful shutdown timeout duration using the property hazelcast.graceful.shutdown.max.wait. Its default value is 10 minutes. If migrations are not completed within this duration, shutdown may continue non-gracefully and lead to data loss. Therefore, you should choose your own timeout duration considering the size of data in your cluster.

Ensuring Safe State with PartitionService

With the improvements in graceful shutdown procedure in previous Hazelcast releases, the following methods are not needed to perform graceful shutdown. Nevertheless, you can use them to check the current safety status of the partitions in your cluster.

public interface PartitionService {
   ...
   ...
    boolean isClusterSafe();
    boolean isMemberSafe(Member member);
    boolean isLocalMemberSafe();
    boolean forceLocalMemberToBeSafe(long timeout, TimeUnit unit);
}

The method isClusterSafe checks whether the cluster is in a safe state. It returns true if there are no active partition migrations and all backups are in sync for each partition.

The method isMemberSafe checks whether a specific member is in a safe state. It checks if all backups of partitions of the given member are in sync with the primary ones. Once it returns true, the given member is safe and it can be shut down without data loss.

Similarly, the method isLocalMemberSafe does the same check for the local member. The method forceLocalMemberToBeSafe forces the owned and backup partitions to be synchronized, making the local member safe.

See here for more PartitionService code samples.

Shutting Down a Hazelcast Cluster

Using the cluster.sh script:

To shutdown the cluster, use the following command:

./cluster.sh -o shutdown -a 172.16.254.1 -p 5702 -g test -P test

Similarly, you can use the following command for the same purpose:

./cluster.sh --operation shutdown --address 172.16.254.1 --port 5702 --clustername test --password test

Using the REST API:

To shutdown the cluster, use the following command:

curl --data "${CLUSTERNAME}&$\{PASSWORD}"  http://127.0.0.1:${PORT}/hazelcast/rest/management/cluster/clusterShutdown

Safely Shutting Down for Lossless Restart

For lossless restart to work the cluster must be shut down gracefully. When members are shutdown one by one in a rapid succession, Hazelcast triggers an automatic rebalancing process where backup partitions are promoted and new backups are created for each member. This may result in out-of-memory errors or data loss.

The entire cluster can be shut down using the hz-cluster-admin command line tool.

bin/jet-cluster-admin -a <address> -c <cluster-name> -o shutdown

For the command line tool to work, REST should be enabled in hazelcast.yaml:

hazelcast:
  network:
    rest-api:
      enabled: true
      endpoint-groups:
        CLUSTER_READ:
          enabled: true
        CLUSTER_WRITE:
          enabled: true
        HEALTH_CHECK:
          enabled: true
        HOT_RESTART:
          enabled: true

Alternatively, you can shutdown the cluster using the Hazelcast Management Center:

![Cluster Shutdown using Management Center](assets/management-center-shutdown.png)

Since the Persistence data is saved locally on each member, all the members must be present after the restart for Hazelcast to be able to reload the data. Beyond that, there’s no special action to take: as soon as the cluster re-forms, it will automatically reload the persisted snapshots and resume the jobs.

Shutting Down a Hazelcast Member

The following are the ways of shutting down a Hazelcast member:

  • You can call kill -9 <PID> in the terminal (which sends a SIGKILL signal). This results in the immediate shutdown which is not recommended for production systems. If you set the property hazelcast.shutdownhook.enabled to false and then kill the process using kill -15 <PID>, its result is the same (immediate shutdown).

  • You can call kill -15 <PID> in the terminal (which sends a SIGTERM signal), or you can call the method HazelcastInstance.getLifecycleService().terminate() programmatically, or you can use the script stop.sh located in your Hazelcast’s /bin directory. All three of them terminate your member ungracefully. They do not wait for migration operations, they force the shutdown. But this is much better than kill -9 <PID> since it releases most of the used resources.

  • In order to gracefully shutdown a Hazelcast member (so that it waits the migration operations to be completed), you have four options:

    • You can call the method HazelcastInstance.shutdown() programatically.

    • You can use JMX API’s shutdown method. You can do this by implementing a JMX client application or using a JMX monitoring tool (like JConsole).

    • You can set the property hazelcast.shutdownhook.policy to GRACEFUL and then shutdown by using kill -15 <PID>. Your member will be gracefully shutdown.

    • You can use the "Shutdown Member" button in the member view of Management Center.

If you use systemd’s systemctl utility, i.e., systemctl stop service_name, a SIGTERM signal is sent. After 90 seconds of waiting it is followed by a SIGKILL signal by default. Thus, it calls terminate at first and kill the member directly after 90 seconds. We do not recommend to use it with its defaults. But systemd is very customizable and well-documented, you can see its details using the command man systemd.kill. If you can customize it to shutdown your Hazelcast member gracefully (by using the methods above), then you can use it.

Using Management Center

You can also use Hazelcast Management Center to shutdown your Hazelcast cluster. See the Cluster State section in the Hazelcast Management Center documentation.