Configuring CP Subsystem
You can configure clusters to enable the CP Subsystem as well as fine-tune many other options such as CP groups sizes and persistence of CP state.
Quickstart Configuration
Use this quickstart to test CP Subsystem in development.
Running clusters must be restarted before any configuration changes take effect. |
CP Subsystem is disabled by default, with any CP data structures operating in unsafe mode. To enable CP Subsystem, configure a non-zero value for the cp-member-count
option:
Use the CPSubsystemConfig
object.
Persisting CP Data Structures
Enterprise Edition
To enable CP Subsystem Persistence, set the persistence-enabled
option to true
.
When CP Subsystem Persistence is enabled, all Hazelcast cluster members, including AP members, create
a sub-directory under the base cp-data
directory. CP members persist CP state in these sub-directories. AP members persist only
their status as non-CP members.
If you have both CP and AP members in your cluster when CP Subsystem Persistence is enabled, and if you want to perform a cluster-wide restart, you need to ensure that AP members are also restarted with their CP persistence stores. |
To change the base directory, set a value in the base-dir
option.
CP Subsystem Persistence and AP Persistence
As well as CP Subsystem Persistence, Hazelcast offers Persistence for some AP data structures. If you persist AP and CP data structures in a single Hazelcast cluster, be aware that member or cluster restarts can fail because of either the persisted AP data structures or the CP data structures.
Choosing a Group Size
For all CP groups, you can set the number of CP members
that should participate in each one, using the group-size
option.
To scale out throughput and memory capacity, you can choose a CP group size that is smaller than the CP member count to distribute your CP data structures to multiple CP groups.
CP groups must consist of three, five, or seven members. An odd number of CP members avoids stalemate scenarios in leadership elections and quorum calculations.
For a CP group of N
members:
-
the majority of members is calculated as
(N + 1) / 2
. -
the number of failing members you wish to tolerate is calculated as
(N - 1) / 2
.
For example, in a CP group of five CP members, operations are committed when they are replicated to at least three CP members. This CP group can tolerate the failure of two CP members and remain available.
Creating separate CP groups allows clients to distribute the load among the leaders of several CP groups. However, a large number of CP groups running on the same members may slow down the system. A good compromise is to create the same number of CP groups as CP members for high throughput use cases. That way, each member will usually be the leader of one group.
You should destroy any unused CP groups.
Configuring Leadership Priority
Some CP members are better leadership candidates than others. For example, members in your local data center are better candidates because of the reduced latency to clients, and members under high load are poor candidates because they are more likely to suffer from downtime. To maximise the availability and throughput of the CP subsystem, you can configure CP members with a priority rating, using the cp-member-priority
option.
On startup, any CP member with any priority can become a leader and cp-member-priority
has no effect. The background leadership rebalancing task periodically transfers the leadership from CP members with the lowest priorities to those with higher priorities within each CP group. Eventually, all CP group leaders are CP members with high priorities.
Only CP members that are part of a CP group can become leaders. CP members that are not in a CP group do not participate in the leadership rebalancing task. For example, you start a cluster with the CP group size set to three and all members with priority 0. Later, you promote a new CP member with priority 100. Even though it has a higher priority, the new CP member never becomes a leader because the CP group is full. The new CP member is eligible for leadership only when a CP member leaves the CP group and the new CP member takes the previous member’s place.
cp-member-priority does not override the Raft leadership election process: it is a hint rather than a rule. Consequently, it is possible that a member with a low priority becomes leader for a period until the background leadership rebalancing task transfers the leadership to a member with a higher priority. The leadership rebalancing task runs approximately every 60 seconds; however, a custom frequency can be defined by the system property hazelcast.raft.leadership.rebalance.period . If necessary, you can also configure members to transfer leadership to another member if elected.
|
Configuring Members to Transfer Leadership
In some deployment scenarios, you may want to prevent a CP group member from becoming the leader. For example, if your CP group is distributed across two local data centers and one remote data center for geographic redundancy, there may be a significantly higher latency to the remote data center.
If the CP group leader is in one of the local data centers — and there are enough local followers to achieve quorum — the leader will not need to wait for acknowledgements from followers in the remote data center and so the increased latency will have no impact. If the leader is in the remote data center, there will be a higher latency for client operations and this will reduce the maximum throughput achievable. To maximize throughput, you can force CP members in the remote data center to automatically transfer leadership to another member by enabling auto-step-down-when-leader
. The increased latency will then only have an impact if a remote data center member is needed to achieve quorum.
In the Raft algorithm, a member must become leader if it has the most recent log in the group. In this case, a member with auto-step-down-when-leader
enabled will become leader but immediately trigger the leadership rebalancing task, and will reject client operations. If the leadership rebalancing task fails, it will retry automatically. Client operations continue to be rejected until a new leader is elected.
Reducing the number of leadership candidates reduces the fault tolerance of the CP Subsystem. You should therefore only enable this feature if it is necessary to achieve your throughput requirements, and you should enable it on as few members as possible. This feature does not apply to the METADATA
group, which is not sensitive to latency.
Configuring CP Sessions
Sessions offer a trade-off between liveliness and safety. If you set a
small value for the session-time-to-live-seconds
option, a
session owner could be considered crashed very quickly and its resources can be
released prematurely. On the other hand, if you set a large value, a session
could be kept alive for an unnecessarily long duration even if its owner
actually crashes. However, it is a safer approach to not use a small session
session-time-to-live-seconds
duration. If a session owner is known to be crashed, its session could be closed manually.
Configuring Fenced Locks
By default, fenced locks are reentrant. When a caller acquires the lock, it can acquire the lock reentrantly as many times as it wants in a linearizable manner.
You can configure the reentrancy behavior in the lock-acquire-limit
option. For example,
reentrancy can be disabled by setting this option to 1
, making the lock a non-reentrant mutex. You can also set a custom reentrancy limit. When the reentrancy limit is already reached, the fenced lock does not block a lock call. Instead, it fails
with LockAcquireLimitReachedException
or a specified return value.
Configuring Semaphores
By default, a caller must acquire permits before releasing them and it cannot release a permit that it has not acquired. This means that you can acquire a permit from one thread and release it from another thread, using the same caller, but not different callers. In this mode, acquired permits are automatically released upon failure of the caller.
To enable a permit to be released without acquiring it first, enable JDK compatibility by setting the jdk-compatibility
option to true
. Because
acquired permits are not bound to threads.
When jdk-compatibility is set to true , Hazelcast does not
auto-cleanup acquired permits upon caller failures. If a permit holder fails, its permits must be released manually.
|
Removing Missing CP Members Automatically
If CP Subsystem Persistence is disabled, CP members lose their state after shutting down and so cannot rejoin the CP Subsystem. You can configure CP members to be automatically removed from the CP Subsystem after they shut down as well as how long to wait after they shut down before removing them.
By default, missing CP members are automatically removed
from the CP Subsystem after 4 hours and replaced with other
available CP members in all its CP groups. You can configure this time, using the missing-cp-member-auto-removal-seconds
option.
If a missing CP member rejoins the cluster after it is automatically removed from the CP Subsystem, that CP member must be terminated manually.
If no CP members are available to replace a missing CP member, the group size of any groups that it was in is reduced and the majority values are recalculated.
When CP Subsystem Persistence is enabled, CP members are not automatically removed from the CP Subsystem. These CP members can restore their CP state from disk and rejoin their CP groups. It is your responsibility to remove CP members if they do not restart. |
Handling Indeterminate Operation State
When you invoke an API method on a CP data structure, the method replicates an internal operation to the corresponding CP group. After the CP leader commits this operation to the majority of the CP group, it sends a response to the public API call. If a failure causes loss of the response, then the caller cannot determine if the operation is committed on the CP group or not.
You can handle loss of the response in two ways:
-
To allow CP leaders to replicate the operation to the CP group multiple times, set the
fail-on-indeterminate-operation-state
option tofalse
(default). -
To send an
IndeterminateOperationStateException
back to the caller, set thefail-on-indeterminate-operation-state
option totrue
.
Global Configuration Options
Use these options to configure the CP Subsystem.
Option | Description | Default | Example | ||
---|---|---|---|---|---|
Number of CP members to initialize the CP Subsystem. If set, must be greater than or equal to |
|
||||
Number of CP members to participate in each CP group. If set, this value must conform to the following rules:
- Must be |
|
||||
Duration for a CP session to be kept alive after the last received heartbeat. A CP session is closed if no session heartbeat is received during this duration. Must be greater than |
|
||||
Interval in seconds for the periodically committed CP session heartbeats. Must be smaller than |
|
||||
Duration in seconds to wait before automatically removing a missing CP member from the CP Subsystem. Must be greater than or equal to A value of
|
|
||||
Whether CP Subsystem operations use
at-least-once and at-most-once execution guarantees. By default, operations use an at-least-once
execution guarantee. If set to |
|
||||
Whether CP Subsystem Persistence is globally enabled for CP groups created in the CP Subsystem. If enabled, CP members persist their local CP data to stable storage and can recover from crashes. |
|
||||
Parent directory where persisted CP data is stored. This directory is created automatically if it does not exist. This directory is shared among multiple CP members safely. This is especially useful for cloud environments where CP members generally use a shared filesystem. |
|
||||
Timeout duration in seconds for CP members to restore their persisted data from disk. A CP member fails its startup if it cannot complete its CP data restore process in the configured duration. |
|
Fenced Lock Options
Use the following options to configure fenced locks:
Add configuration options to the FencedLockConfig
object.
Option | Description | Default | Example |
---|---|---|---|
The name of the fenced lock to configure. |
|
||
Maximum number of reentrant lock acquisitions. Once a caller acquires the lock this many times, it will not be able to acquire the lock again, until it makes at least one |
|
Semaphore Options
Use the following options to configure semaphores:
Add configuration options to the SemaphoreConfig
object.
Option | Description | Default | Example |
---|---|---|---|
Name of the semaphore. |
|
||
Whether JDK compatibility is enabled. See Semaphores. |
|
||
Number of permits to initialize the semaphore. If a positive value is set, the semaphore is initialized with the given number of permits. |
|
CPMap Options
Use the following options to configure CPMap
instances:
Add configuration options to the CPMap
object.
Option | Description | Default | Example |
---|---|---|---|
Name of the CPMap. |
|
||
Maximum permitted size in MB for the totality of key-value data. The maximum permitted size is 2000MB. |
|
Raft Algorithm Options
Use these options to fine-tune the Raft consensus algorithm.
Do not change these settings unless you know what you’re doing. |
Add configuration options to the RaftAlgorithmConfig
object.
Option | Description | Default | Example |
---|---|---|---|
Leader election timeout in milliseconds. If a candidate cannot win the majority of the votes in time, a new election round is initiated. |
|
||
Duration in milliseconds for a CP group leader to send periodic heartbeat messages to its followers to indicate its liveliness. Periodic heartbeat messages are actually append entries requests and can contain log entries for the lagging followers. If too small a value is set, heartbeat messages are sent from leaders to followers too frequently and it can cause an unnecessary usage of CPU and network bandwidth. |
|
||
Maximum number of missed CP group leader heartbeats for a follower to trigger a new leader election round. For example, if |
|
||
Maximum number of Raft log entries that can be sent as a batch in a single append entries request. In Hazelcast’s Raft consensus algorithm implementation, a leader maintains a separate replication pipeline for each follower. It sends a new batch of Raft log entries to a follower after the follower acknowledges the last append entries request sent by the leader. |
|
||
Number of new commits to initiate a new snapshot after the last snapshot taken by the local CP group member. This value affects performance of the system in multiple ways. If a small value is set, it means that snapshots are taken too frequently and CP group members keep a very short Raft log. If snapshots are large and CP Subsystem Persistence is enabled, this can create a high overhead on I/O performance. Moreover, a leader can send too many snapshots to followers and this can create a high overhead on network use. On the other hand, if a very large value is set, it can create a memory overhead since Raft log entries are going to be kept in memory until the next snapshot. |
|
||
Maximum number of
uncommitted log entries in the leader’s Raft log before temporarily rejecting
new requests of callers. Because leaders send log entries to followers in
batches, they accumulate incoming requests in order to improve the throughput.
You can configure this field by considering your degree of concurrency in your
callers. For instance, if you have at most |
|
||
Timeout duration in milliseconds to apply backoff on append entries requests. After a CP group leader sends an append entries request to a follower, it will not send a subsequent append entries request either until the follower responds or this timeout occurs. Backoff durations are increased exponentially if followers remain unresponsive. |
|
Per-Member Configuration Options
Use these options to configure individual CP members.
Option | Description | Default | Example |
---|---|---|---|
The priority rating as a positive or negative integer. The leader role is eventually transferred to members with higher priorities within a CP group. |
|
||
Whether this member should automatically step down if elected leader. If enabled, |
|