CP Subsystem Management

Unlike the dynamic nature of Hazelcast clusters, CP Subsystem requires manual intervention while expanding/shrinking its size, or when a CP member crashes or becomes unreachable. When a CP member becomes unreachable, it is not automatically removed from CP Subsystem because it could be still alive and partitioned away.

Moreover, by default CP Subsystem works in memory without persisting any state to disk. It means that a crashed CP member will not be able to recover by reloading its previous state. Therefore, crashed CP members create a danger for gradually losing the majority of CP groups and eventually total loss of the availability of CP Subsystem. To prevent such situations, CPSubsystemManagementService offers a set of APIs. In addition, CP Subsystem Persistence can be enabled to make CP members persist their local CP state to stable storage. Please see CP Subsystem Persistence section for more details.

CP Subsystem relies on Hazelcast’s failure detectors to test reachability of CP members. Before removing a CP member from CP Subsystem, please make sure that it is declared as unreachable by Hazelcast’s failure detector and removed from Hazelcast cluster’s member list.

CP member additions and removals are internally handled by performing a single membership change at a time. When multiple CP members are shutting down concurrently, their shutdown process is executed serially. When a CP membership change is triggered, the METADATA CP group creates a membership change plan for CP groups. Then, the scheduled changes are applied to the CP groups one by one. After all CP group member removals are done, the shutting down CP member is removed from the active CP members list and its shutdown process is completed. A shut-down CP member is automatically replaced with another available CP member in all of its CP groups, including the METADATA CP group, in order not to decrease or more importantly not to lose the majority of CP groups. If there is no available CP member to replace a shutting down CP member in a CP group, that group’s size is reduced by 1 and its majority value is recalculated. Please note that this behavior is when CP Subsystem Persistence is disabled. When CP Subsystem Persistence is enabled, shut-down CP members are not automatically removed from the active CP members list and they are still considered as part of CP groups and majority calculations, because they can come back by restoring their local CP state from stable storage. If you know that a shut-down CP member will not be restarted, you need to remove that member from CP Subsystem via CPSubsystemManagementService.removeCPMember(String).

A new CP member can be added to CP Subsystem to either increase the number of available CP members for new CP groups or to fill the missing slots in existing CP groups. After the initial Hazelcast cluster startup is done, an existing Hazelcast member can be be promoted to the CP member role. This new CP member automatically joins to CP groups that have missing members, and majority values of these CP groups are recalculated.

CP Subsystem Management APIs

You can access the CP Subsystem management APIs using the Java API or REST interface. To communicate with the REST interface there are two options; one is to access REST endpoint URL directly or using the cp-subsystem.sh shell script, which comes with the Hazelcast package.

The cp-cluster.sh script uses curl command, and curl must be installed to be able to use the script.
  • Get Local CP Member:

    Returns the local CP member if this Hazelcast member is a part of CP Subsystem.

    Java API
            CPMember localMember = cpSubsystem.getLocalCPMember();
    REST API
    > curl http://127.0.0.1:5701/hazelcast/rest/cp-subsystem/members/local
    OR
    > sh cp-subsystem.sh -o get-local-member --address 127.0.0.1 --port 5701
    +
    Sample Response:
    {
        "uuid": "6428d7fd-6079-48b2-902c-bdf6a376051e",
        "address": "[127.0.0.1]:5701"
    }
  • Get CP Groups:

    Returns the list of active CP groups.

    Java API
            CPSubsystemManagementService managementService = cpSubsystem.getCPSubsystemManagementService();
            CompletionStage<Collection<CPGroupId>> future = managementService.getCPGroupIds();
            Collection<CPGroupId> groups = future.toCompletableFuture().get();
    REST API
    > curl http://127.0.0.1:5701/hazelcast/rest/cp-subsystem/groups
    OR
    > sh cp-subsystem.sh -o get-groups --address 127.0.0.1 --port 5701
    +
    Sample Response:
    [{
        "name": "METADATA",
        "id": 0
    }, {
        "name": "atomics",
        "id": 8
    }, {
        "name": "locks",
        "id": 14
    }]
  • Get a single CP Group:

    Returns the active CP group with the given name. There can be at most one active CP group with a given name.

    Java API
            CPSubsystemManagementService managementService = cpSubsystem.getCPSubsystemManagementService();
            CompletionStage<CPGroup> future = managementService.getCPGroup(groupName);
            CPGroup group = future.toCompletableFuture().get();
    REST API
    > curl http://127.0.0.1:5701/hazelcast/rest/cp-subsystem/groups/$\{CPGROUP_NAME}
    OR
    > sh cp-subsystem.sh -o get-group --group $\{CPGROUP_NAME} --address 127.0.0.1 --port 5701
    +
    Sample Response:
    {
        "id": {
            "name": "locks",
            "id": 14
        },
        "status": "ACTIVE",
        "members": [{
            "uuid": "33f84b0f-46ba-4a41-9e0a-29ee284c1c2a",
            "address": "[127.0.0.1]:5703"
        }, {
            "uuid": "59ca804c-312c-4cd6-95ff-906b2db13acb",
            "address": "[127.0.0.1]:5704"
        }, {
            "uuid": "777ff6ea-b8a3-478d-9642-47d1db019b37",
            "address": "[127.0.0.1]:5705"
        }, {
            "uuid": "c7856e0f-25d2-4717-9919-88fb3ecb3384",
            "address": "[127.0.0.1]:5702"
        }, {
            "uuid": "c6229b44-8976-4602-bb57-d13cf743ccef",
            "address": "[127.0.0.1]:5701"
        }]
    }
  • Get CP Members:

    Returns the list of active CP members in the cluster.

    Java API
            CPSubsystemManagementService managementService = cpSubsystem.getCPSubsystemManagementService();
            CompletionStage<Collection<CPMember>> future = managementService.getCPMembers();
            Collection<CPMember> members = future.toCompletableFuture().get();
    REST API
    > curl http://127.0.0.1:5701/hazelcast/rest/cp-subsystem/members
    OR
    > sh cp-subsystem.sh -o get-members --address 127.0.0.1 --port 5701
    +
    Sample Response:
    [{
        "uuid": "33f84b0f-46ba-4a41-9e0a-29ee284c1c2a",
        "address": "[127.0.0.1]:5703"
    }, {
        "uuid": "59ca804c-312c-4cd6-95ff-906b2db13acb",
        "address": "[127.0.0.1]:5704"
    }, {
        "uuid": "777ff6ea-b8a3-478d-9642-47d1db019b37",
        "address": "[127.0.0.1]:5705"
    }, {
        "uuid": "c6229b44-8976-4602-bb57-d13cf743ccef",
        "address": "[127.0.0.1]:5701"
    }, {
        "uuid": "c7856e0f-25d2-4717-9919-88fb3ecb3384",
        "address": "[127.0.0.1]:5702"
    }]
  • Force Destroy a CP Group:

    Unconditionally destroys the given active CP group without using the Raft algorithm mechanics. This method must be used only when a CP group loses its majority and cannot make progress anymore. Normally, membership changes in CP groups, such as CP member promotion or removal, are done via the Raft consensus algorithm. However, when a CP group permanently loses its majority, it will not be able to commit any new operation. Therefore, this method ungracefully terminates the remaining members of the given CP group on the remaining CP group members. It also performs a Raft commit to the METADATA CP group in order to update the status of the destroyed group. Once a CP group is destroyed, all CP data structure proxies created before the destroy fails with CPGroupDestroyedException. However, if a new proxy is created afterwards, then this CP group is re-created from scratch with a new set of CP members.

    This method is idempotent. It has no effect if the given CP group is already destroyed.

    Java API
            CPSubsystemManagementService managementService = cpSubsystem.getCPSubsystemManagementService();
            CompletionStage<Void> future = managementService.forceDestroyCPGroup(groupName);
            future.toCompletableFuture().get();
    REST API
    > curl -X POST --data "${GROUPNAME}&$\{PASSWORD}" http://127.0.0.1:5701/hazelcast/rest/cp-subsystem/groups/$\{CPGROUP_NAME}/remove
    OR
    > sh cp-subsystem.sh -o force-destroy-group --group $\{CPGROUP_NAME} --address 127.0.0.1 --port 5701 --groupname ${GROUPNAME} --password $\{PASSWORD}
  • Remove a CP Member:

    Removes the given unreachable CP member from the active CP members list and all CP groups it belongs to. If any other active CP member is available, it replaces the removed CP member in its CP groups. Otherwise, CP groups which the removed CP member is a member of shrinks and their majority values are recalculated.

    Before removing a CP member from CP Subsystem, please make sure that it is declared as unreachable by Hazelcast’s failure detector and removed from Hazelcast’s member list. The behavior is undefined when a running CP member is removed from CP Subsystem.
    Java API
            CPSubsystemManagementService managementService = cpSubsystem.getCPSubsystemManagementService();
            CompletionStage<Void> future = managementService.removeCPMember(memberUUID);
            future.toCompletableFuture().get();
    REST API
    > curl -X POST --data "${GROUPNAME}&$\{PASSWORD}" http://127.0.0.1:5701/hazelcast/rest/cp-subsystem/members/$\{CPMEMBER_UUID}/remove
    OR
    > sh cp-subsystem.sh -o remove-member --member $\{CPMEMBER_UUID} --address 127.0.0.1 --port 5701 --groupname ${GROUPNAME} --password $\{PASSWORD}
  • Promote Local Member to a CP Member

    Promotes the local Hazelcast member to the CP member. If the local member is already in the active CP members list, i.e., it is already a CP member, then this method has no effect. When the local member is promoted to the CP role, its member UUID is assigned as CP member UUID. The promoted CP member will be added to the CP groups that have missing members, i.e., whose current size is smaller than CPSubsystemConfig.getGroupSize().

    Java API
            CPSubsystemManagementService managementService = cpSubsystem.getCPSubsystemManagementService();
            CompletionStage<Void> future = managementService.promoteToCPMember();
            future.toCompletableFuture().get();
    REST API
    > curl -X POST --data "${GROUPNAME}&$\{PASSWORD}" http://127.0.0.1:5701/hazelcast/rest/cp-subsystem/members
    OR
    > sh cp-subsystem.sh -o promote-member --address 127.0.0.1 --port 5701 --groupname ${GROUPNAME} --password $\{PASSWORD}
  • Wipe and Reset CP Subsystem

    Wipes and resets the whole CP Subsystem state and initializes it as if the Hazelcast cluster is starting up initially. This method must be used only when the METADATA CP group loses its majority and cannot make progress anymore.

    After this method is called, all CP state and data are wiped and CP members start with empty state.

    This method can be invoked only from the Hazelcast master member, which is the first member in the Hazelcast cluster member list. Moreover, the Hazelcast cluster must have at least CPSubsystemConfig.getCPMemberCount() members.

    This method must not be called while there are membership changes in the Hazelcast cluster. Before calling this method, please make sure that there is no new member joining and all existing Hazelcast members have seen the same member list.

    To be able to use this method, the initial CP member count of CP Subsystem, which is defined by CPSubsystemConfig.getCPMemberCount(), must be satisfied. For instance, if CPSubsystemConfig.getCPMemberCount() is 5 and only 1 CP member is alive, when this method is called, 4 additional AP Hazelcast members should exist in the cluster, or new Hazelcast members must be started.

    This method also deletes all data written by CP Subsystem Persistence.

    This method triggers a new CP discovery process round. However, if the new CP discovery round fails for any reason, Hazelcast members are not terminated, because Hazelcast members are likely to contain data for AP data structures and their termination can cause data loss. Hence, you need to observe the cluster and check if the CP discovery process completes successfully.

    This method is NOT idempotent and multiple invocations can break the whole system! After calling this API, you must observe the system to see if the reset process is successfully completed or failed before making another call.
    This method deletes all CP data written by CP Subsystem Persistence.
    Java API
            CPSubsystemManagementService managementService = cpSubsystem.getCPSubsystemManagementService();
            CompletionStage<Void> future = managementService.reset();
            future.toCompletableFuture().get();
    REST API
    > curl -X POST --data "${GROUPNAME}&$\{PASSWORD}" http://127.0.0.1:5701/hazelcast/rest/cp-subsystem/reset
    OR
    > sh cp-subsystem.sh -o reset --address 127.0.0.1 --port 5701 --groupname ${GROUPNAME} --password $\{PASSWORD}

Session Management API

There are two management API methods for session management.

  • Get CP Group Sessions:

    Returns all CP sessions that are currently active in a CP group.

    Java API
            CPSessionManagementService sessionManagementService = cpSubsystem.getCPSessionManagementService();
            CompletionStage<Collection<CPSession>> future = sessionManagementService.getAllSessions(groupName);
            Collection<CPSession> sessions = future.toCompletableFuture().get();
    REST API
    > curl http://127.0.0.1:5701/hazelcast/rest/cp-subsystem/groups/$\{CPGROUP_NAME}/sessions
    OR
    > sh cp-subsystem.sh -o get-sessions --group $\{CPGROUP_NAME} --address 127.0.0.1 --port 5701
    +
    Sample Response:
    [{
        "id": 1,
        "creationTime": 1549008095530,
        "expirationTime": 1549008766630,
        "version": 73,
        "endpoint": "[127.0.0.1]:5701",
        "endpointType": "SERVER",
        "endpointName": "hz-member-1"
    }, {
        "id": 2,
        "creationTime": 1549008115419,
        "expirationTime": 1549008765425,
        "version": 71,
        "endpoint": "[127.0.0.1]:5702",
        "endpointType": "SERVER",
        "endpointName": "hz-member-2"
    }]
  • Force Close a Session:

    If a Hazelcast instance that owns a CP session crashes, its CP session is not terminated immediately. Instead, the session is closed after CPSubsystemConfig.getSessionTimeToLiveSeconds() passes. If it is known for sure that the session owner is not partitioned and definitely crashed, this method can be used for closing the session and releasing its resources immediately.

    Java API
            CPSessionManagementService sessionManagementService = cpSubsystem.getCPSessionManagementService();
            CompletionStage<Boolean> future = sessionManagementService.forceCloseSession(groupName, sessionId);
            future.toCompletableFuture().get();
    REST API
    > curl -X POST --data "${GROUPNAME}&$\{PASSWORD}" http://127.0.0.1:5701/hazelcast/rest/cp-subsystem/groups/$\{CPGROUP_NAME}/sessions/$\{CP_SESSION_ID}/remove
    OR
    > sh cp-subsystem.sh -o force-close-session --group $\{CPGROUP_NAME} --session-id $\{CP_SESSION_ID} --address 127.0.0.1 --port 5701 --groupname ${GROUPNAME} --password $\{PASSWORD}