CP Member Shutdown

Please read this part carefully to notice the behavioral difference in the CP member shutdown process when CP Subsystem Persistence is enabled and disabled.

There is a significant behavioral difference during the CP member shutdown when CP Subsystem Persistence is enabled and disabled. When disabled (the default mode in which CP Subsystem works only in memory), a shutting down CP member is replaced with other available CP members in all of its CP groups in order not to decrease or more importantly not to lose majorities of CP groups. It is because CP members keep their local state only in memory when CP Subsystem Persistence is disabled, hence a shut-down CP member cannot join back with its CP identity and state, hence it is better to remove it from CP Subsystem to not to harm availability of CP groups. If there is no other available CP member to replace a shutting down CP member in a CP group, that CP group’s size is reduced by 1 and its majority value is recalculated. On the other hand, when CP Subsystem Persistence is enabled, a shut-down CP member can come back by restoring its CP state. Therefore, it is not automatically removed from CP Subsystem when CP Subsystem Persistence is enabled. It is up to you to remove shut-down CP members via CPSubsystemManagementService.removeCPMember(String) if they will not come back.

In summary, CP member shutdown behavior is as follows:

When CP Subsystem Persistence is disabled (the default mode), shutting down CP members are removed from CP Subsystem and the CP group majority values are recalculated.
When CP Subsystem Persistence is enabled, shutting down CP members are still kept in CP Subsystem so they will be a part of the CP group majority calculations.

Moreover, there is a subtle point about concurrent shutdown of CP members when CP Subsystem Persistence is disabled. If there are N CP members in CP Subsystem, HazelcastInstance.shutdown() can be called on N-2 CP members concurrently. Once these N-2 CP members complete their shutdown, the remaining 2 CP members must be shut down serially. Even though the shutdown API can be called concurrently on multiple members, the METADATA CP group handles shutdown requests serially. Therefore, it would be simpler to shut down CP members one by one, by calling HazelcastInstance.shutdown() on the next CP member once the current CP member completes its shutdown. This rule does not apply when CP Subsystem Persistence is enabled so you can shut down your CP members concurrently if you enabled CP Subsystem Persistence. It is enough for users to recall this rule while shutting down CP members when CP Subsystem Persistence is disabled. If interested, you can read the rest of this paragraph to learn the reasoning behind this rule. Each shutdown request internally requires a Raft commit to the METADATA CP group when CP Subsystem Persistence is disabled. A CP member proceeds to shutdown after it receives a response of this commit. To be able to perform a Raft commit, the METADATA CP group must have its majority up and running. When only 2 CP members are left after graceful shutdowns, the majority of the METADATA CP group becomes 2. If the last 2 CP members shut down concurrently, one of them is likely to perform its Raft commit faster than the other one and leave the cluster before the other CP member completes its Raft commit. In this case, the last CP member waits for a response of its commit attempt on the METADATA CP group, and times out eventually. This situation causes an unnecessary delay on the shutdown process of the last CP member. On the other hand, when the last 2 CP members shut down serially, the N-1th member receives the response of its commit after its shutdown request is committed also on the last CP member. Then, the last CP member checks its local data to notice that it is the last CP member alive, and proceeds its shutdown without attempting a Raft commit on the METADATA CP group.

CP Member Shutdown

Send us your feedback

Help and support