You can configure clusters to enable the CP Subsystem and adjust other options such as CP groups sizes and persistence of CP state.
Quickstart
Use this quickstart to test the CP Subsystem in development. The CP Subsystem is disabled by default, with any CP data structures operating in unsafe mode.
| Running clusters must be restarted before any configuration changes take effect. | 
- 
To enable the CP Subsystem, configure a non-zero value for the cp-member-countoption:<hazelcast> <cp-subsystem> <cp-member-count>3</cp-member-count> <!-- configuration options here --> </cp-subsystem> </hazelcast>hazelcast: cp-subsystem: cp-member-count: 3 # configuration options hereUse the CPSubsystemConfigobject.Config config = new Config(); config.getCPSubsystemConfig() .setCPMemberCount(3)
Persist CP data structures
To enable CP Subsystem persistence, set the persistence-enabled option to true.
When CP Subsystem persistence is enabled, all Hazelcast cluster members — including AP members — create a directory under the base cp-data directory. CP members persist CP state in these sub-directories. AP members persist only their status as non-CP members. Ensure that you do not delete these directories unless you need to remove stale data to recover the system as described in CP Subsystem persistence and AP persistence.
To change the base directory, set a value in the base-dir option.
CP Subsystem persistence and AP persistence
Hazelcast also offers persistence for some AP data structures. If you persist AP and CP data structures in a single Hazelcast cluster, be aware that member or cluster restarts can fail if the persisted data (CP or AP) becomes stale.
You can configure AP members to automatically remove stale data with the auto-remove-stale-data option — see Configuring persistence. Otherwise, you will need to remove stale data manually.
CP members can usually rejoin the cluster after a restart. The CP group leader will reissue any missing operations to catch up the rejoining member’s Raft log. However, if a CP member gets significantly out of sync, you may need to remove stale data manually.
Choose a group size
For all CP groups, you can set the number of CP members that should participate in each one using the group-size option.
To scale out throughput and memory capacity, you can choose a CP group size that is smaller than the CP member count to distribute your CP data structures to multiple CP groups.
| If you want to distribute CP data to a group other than DEFAULT, you must specify the group name when creating or fetching the CP data structure proxy. | 
CP groups must consist of three, five, or seven members. An odd number of CP members avoids stalemate scenarios in leadership elections and quorum calculations.
For a CP group of N members:
- 
the majority of members is calculated as (N + 1) / 2.
- 
the number of failing members the group can tolerate is calculated as (N - 1) / 2.
For example, in a CP group of five CP members, operations are committed when they are replicated to at least three CP members. This CP group can tolerate the failure of two CP members and remain available.
Creating separate CP groups allows clients to distribute the load among the leaders of several CP groups. However, a large number of CP groups running on the same members may slow down the system. A good compromise is to create the same number of CP groups as CP members for high throughput use cases. That way, each member will usually be the leader of one group.
You should destroy any unused CP groups.
Configure leadership priority
Some CP members are better leadership candidates than others. For example, members in your local data center are better candidates because of the reduced latency to clients, and members under high load are poor candidates because they are more likely to suffer from downtime. To maximise the availability and throughput of the CP subsystem, you can configure CP members with a priority rating, using the cp-member-priority option.
On startup, any CP member with any priority can become a leader and cp-member-priority has no effect. The background leadership rebalancing task periodically transfers the leadership from CP members with the lowest priorities to those with higher priorities within each CP group. Eventually, all CP group leaders are CP members with high priorities.
Only CP members that are part of a CP group can become leaders. CP members that are not in a CP group do not participate in the leadership rebalancing task. For example, you start a cluster with the CP group size set to three and all members with priority 0. Later, you configure a new CP member with priority 100. Even though it has a higher priority, the new CP member never becomes a leader because the CP group is full. The new CP member is eligible for leadership only when a CP member leaves the CP group and the new CP member takes the previous member’s place.
| cp-member-prioritydoes not override the Raft leadership election process: it is a hint rather than a rule. Consequently, it is possible that a member with a low priority becomes leader for a period until the background leadership rebalancing task transfers the leadership to a member with a higher priority. The leadership rebalancing task runs approximately every 60 seconds; however, a custom frequency can be defined by the system propertyhazelcast.raft.leadership.rebalance.period. If necessary, you can also configure members to transfer leadership to another member if elected. | 
Configure members to transfer leadership
In some deployment scenarios, you may want to prevent a CP group member from becoming the leader. For example, if your CP group is distributed across two local data centers and one remote data center for geographic redundancy, there may be a significantly higher latency to the remote data center.
If the CP group leader is in one of the local data centers — and there are enough local followers to achieve quorum — the leader will not need to wait for acknowledgements from followers in the remote data center and so the increased latency will have no impact. If the leader is in the remote data center, there will be a higher latency for client operations and this will reduce the maximum throughput achievable. To maximize throughput, you can force CP members in the remote data center to automatically transfer leadership to another member by enabling auto-step-down-when-leader. The increased latency will then only have an impact if a remote data center member is needed to achieve quorum.
In the Raft algorithm, a member must become leader if it has the most recent log in the group. In this case, a member with auto-step-down-when-leader enabled will become leader but immediately trigger the leadership rebalancing task, and will reject client operations. If the leadership rebalancing task fails, it will retry automatically. Client operations continue to be rejected until a new leader is elected.
Reducing the number of leadership candidates reduces the fault tolerance of the CP Subsystem. You should therefore only enable this feature if it is necessary to achieve your throughput requirements, and you should enable it on as few members as possible. This feature does not apply to the METADATA group, which is not sensitive to latency.
Configure CP sessions
Sessions offer a trade-off between liveness and safety. If you set a small value for the session-time-to-live-seconds option, a session owner could be considered crashed very quickly and its resources could be released prematurely. On the other hand, if you set a large value, a session could be kept alive for an unnecessarily long duration after its owner crashes. We recommend using the default session-time-to-live-seconds duration. If a session owner is known to have crashed, you can close its session manually by calling this method on the handler side: CPSessionManagementService#forceCloseSession(String, Long).
Configure fenced locks
By default, fenced locks are reentrant. This means that once a caller has acquired the lock, the same caller can acquire it again multiple times in a linearizable manner without releasing it first.
You can configure the reentrancy behavior with the lock-acquire-limit option. For example, reentrancy can be disabled by setting this option to 1, making the lock a non-reentrant mutex. You can also set a custom reentrancy limit. When the reentrancy limit is reached, the fenced lock fails with LockAcquireLimitReachedException or a specified return value.
Configure semaphores
By default, a caller must acquire permits before releasing them and it cannot release a permit that it has not acquired. This means that you can acquire a permit from one thread and release it from another thread, using the same caller, but not different callers. In this mode, acquired permits are automatically released upon failure of the caller.
To enable a permit to be released without acquiring it first, enable JDK compatibility by setting the jdk-compatibility option to true.
| When jdk-compatibilityis set totrue, Hazelcast does not auto-cleanup acquired permits upon caller failures. If a permit holder fails, its permits must be released manually. | 
Remove missing CP members
If CP Subsystem persistence is disabled, CP members lose their state after shutting down and so cannot rejoin the CP Subsystem. You can configure CP members to be automatically removed from the CP Subsystem after they shut down as well as how long to wait after they shut down before removing them.
By default, missing CP members are automatically removed from the CP Subsystem after four hours and are replaced by other CP members if any are available. You can configure this time using the missing-cp-member-auto-removal-seconds option.
If a missing CP member rejoins the cluster after it is automatically removed from the CP Subsystem, that CP member must be terminated manually. See Shutting Down a Hazelcast Member.
If no CP members are available to replace a missing CP member, the group size of any groups that it was in is reduced and the majority values are recalculated. The CP group will continue to function as long as a majority of the configured number of CP members remains available.
When CP Subsystem persistence is enabled, CP members are not automatically removed from the CP Subsystem. These CP members can restore their CP state from disk and rejoin their CP groups. It is your responsibility to remove CP members if they do not restart.
Handle indeterminate operation state
When you invoke an API method on a CP data structure, the method replicates an internal operation to the corresponding CP group. After the CP leader commits this operation to the majority of the CP group, it sends a response to the public API call. If a failure causes loss of the response, then the caller cannot determine if the operation is committed on the CP group or not.
You can handle loss of the response in two ways:
- 
To allow CP leaders to replicate the operation to the CP group multiple times, set the fail-on-indeterminate-operation-stateoption tofalse(default).
- 
To send an IndeterminateOperationStateExceptionback to the caller, set thefail-on-indeterminate-operation-stateoption totrue.
Global configuration options
Use these options to configure the CP Subsystem.
| Option | Description | Default | Example | ||
|---|---|---|---|---|---|
| Number of CP members to initialize the CP Subsystem. If set, must be greater than or equal to  | 
 | ||||
| Number of CP members to participate in each CP group. If set, this value must conform to the following rules:
- Must be  | 
 | ||||
| Duration for a CP session to be kept alive after the last received heartbeat. A CP session is closed if no session heartbeat is received during this duration. Must be greater than  | 
 | ||||
| Interval in seconds for the periodically committed CP session heartbeats. Must be smaller than  | 
 |  | |||
| Duration in seconds to wait before automatically removing a missing CP member from the CP Subsystem. Must be greater than or equal to  A value of  
 | 
 |  | |||
| Whether CP Subsystem operations use
at-least-once and at-most-once execution guarantees. By default, operations use an at-least-once
execution guarantee. If set to  | 
 |  | |||
| Whether CP Subsystem persistence is globally enabled for CP groups created in the CP Subsystem. If enabled, CP members persist their local CP data to stable storage and can recover from crashes. | 
 | ||||
| Parent directory where persisted CP data is stored. This directory is created automatically if it does not exist. This directory is shared among multiple CP members safely. This is especially useful for cloud environments where CP members generally use a shared filesystem. | 
 | ||||
| Timeout duration in seconds for CP members to restore their persisted data from disk. A CP member fails its startup if it cannot complete its CP data restore process in the configured duration. | 
 | 
Fenced lock options
Use the following options to configure fenced locks:
<hazelcast>
  <cp-subsystem>
    <locks>
      <fenced-lock>
        <!-- insert configuration options here -->
      </fenced-lock>
    </locks>
  </cp-subsystem>
</hazelcast>hazelcast:
  cp-subsystem:
    locks:
      # insert configuration options hereAdd configuration options to the FencedLockConfig object.
Config config = new Config();
FencedLockConfig lockConfig = new FencedLockConfig(/*options*/);
config.getCPSubsystemConfig().addLockConfig(lockConfig);| Option | Description | Default | Example | 
|---|---|---|---|
| The name of the fenced lock to configure. | 
 |  | |
| Maximum number of reentrant lock acquisitions. Once a caller acquires the lock this many times, it will not be able to acquire the lock again, until it makes at least one  | 
 |  | 
Semaphore options
Use the following options to configure semaphores:
<hazelcast>
  <cp-subsystem>
    < semaphores >
      <semaphore>
        <!-- insert configuration options here -->
      </semaphore >
    </semaphores >
  </cp-subsystem>
</hazelcast>hazelcast:
  cp-subsystem:
    semaphores:
      # insert configuration options hereAdd configuration options to the SemaphoreConfig object.
Config config = new Config();
SemaphoreConfig semaphoreConfig = new SemaphoreConfig(/*insert configuration options here*/);
config.getCPSubsystemConfig().addSemaphoreConfig(semaphoreConfig);| Option | Description | Default | Example | 
|---|---|---|---|
| Name of the semaphore. | 
 |  | |
| Whether JDK compatibility is enabled. See Semaphores. | 
 |  | |
| Number of permits to initialize the semaphore. If a positive value is set, the semaphore is initialized with the given number of permits. | 
 |  | 
CPMap options
Use the following options to configure CPMap instances:
<hazelcast>
  <cp-subsystem>
    <maps>
      <map>
        <!-- insert configuration options here -->
      </map >
    </maps >
  </cp-subsystem>
</hazelcast>hazelcast:
  cp-subsystem:
    maps:
      # insert configuration options hereAdd configuration options to the CPMap object.
Config config = new Config();
CPMapConfig cpMapConfig = new CPMapConfig(/*insert configuration options here*/);
config.getCPSubsystemConfig().addCPMapConfig(cpMapConfig);| Option | Description | Default | Example | 
|---|---|---|---|
| Name of the CPMap. | 
 | ||
| Maximum permitted size in MB for the totality of key-value data. The maximum permitted size is 2000MB. | 
 |  | 
Raft algorithm options
Use these options to fine-tune the Raft consensus algorithm.
| Do not change these settings unless you know what you’re doing. | 
<hazelcast>
    <cp-subsystem>
        <raft-algorithm>
          <!-- insert configuration options here -->
        </raft-algorithm>
    </cp-subsystem>
</hazelcast>hazelcast:
  cp-subsystem:
    raft-algorithm:
      # insert configuration options hereAdd configuration options to the RaftAlgorithmConfig object.
Config config = new Config();
RaftAlgorithmConfig raftConfig = new RaftAlgorithmConfig();
config.getCPSubsystemConfig().setRaftAlgorithmConfig(raftConfig);| Option | Description | Default | Example | 
|---|---|---|---|
| Leader election timeout in milliseconds. If a candidate cannot win the majority of the votes in time, a new election round is initiated. | 
 |  | |
| Interval in milliseconds at which a CP group leader sends periodic heartbeat messages to its followers to indicate its liveness. Periodic heartbeat messages are actually append entries requests and can contain log entries for the lagging followers. If too small a value is set, heartbeat messages are sent from leaders to followers too frequently and it can cause an unnecessary usage of CPU and network bandwidth. | 
 |  | |
| Maximum number of missed CP group leader heartbeats for a follower to trigger a new leader election round. For example, if  | 
 |  | |
| Maximum number of Raft log entries that can be sent as a batch in a single append entries request. In Hazelcast’s Raft consensus algorithm implementation, a leader maintains a separate replication pipeline for each follower. It sends a new batch of Raft log entries to a follower after the follower acknowledges the last append entries request sent by the leader. | 
 |  | |
| Number of new commits to initiate a new snapshot after the last snapshot taken by the local CP group member. This value affects performance of the system in multiple ways. If a small value is set, it means that snapshots are taken too frequently and CP group members keep a very short Raft log. If snapshots are large and CP Subsystem persistence is enabled, this can create a high overhead on I/O performance. Moreover, a leader can send too many snapshots to followers and this can create a high overhead on network use. On the other hand, if a very large value is set, it can create a memory overhead since Raft log entries are going to be kept in memory until the next snapshot. | 
 |  | |
| Maximum number of
uncommitted log entries in the leader’s Raft log before temporarily rejecting
new requests of callers. Because leaders send log entries to followers in
batches, they accumulate incoming requests to improve the throughput.
You can configure this field by considering your degree of concurrency in your
callers. For instance, if you have at most  | 
 |  | |
| Timeout duration in milliseconds to apply backoff on append entries requests. After a CP group leader sends an append entries request to a follower, it will not send a subsequent append entries request either until the follower responds or this timeout occurs. Backoff durations are increased exponentially if followers remain unresponsive. | 
 |  | 
Per member configuration options
Use these options to configure individual CP members.
| Option | Description | Default | Example | 
|---|---|---|---|
| The priority rating as a positive or negative integer. The leader role is eventually transferred to members with higher priorities within a CP group. | 
 | ||
| Whether this member should automatically step down if elected leader. If enabled,  | 
 |