Configuring CP Subsystem
You can configure clusters to enable the CP Subsystem as well as fine-tune many other options such as CP groups sizes and persistence of CP state.
Quickstart Configuration
Use this quickstart to test CP Subsystem in development.
Before going into production, make sure to read the Production Checklist section. |
By default, CP Subsystem runs in unsafe mode, which means it is disabled. To enable the CP Subsystem, you must first configure a non-zero value for the cp-member-count
option:
<hazelcast>
<cp-subsystem>
<cp-member-count>3</cp-member-count>
<!-- configuration options here -->
</cp-subsystem>
</hazelcast>
hazelcast:
cp-subsystem:
cp-member-count: 3
# configuration options here
Use the CPSubsystemConfig
object.
Config config = new Config();
config.getCPSubsystemConfig()
.setCPMemberCount(3)
Production Checklist
Before you start configuring members, consider the following checklist:
-
Are you already persisting AP data structures in the same cluster?
Running clusters must be restarted before any configuration changes take effect. |
Persisting CP Data Structures
Enterprise
To enable CP Subsystem Persistence, set the persistence-enabled
option to true
.
When CP Subsystem Persistence is enabled, all Hazelcast cluster members, including AP members, create
a sub-directory under the base cp-data
directory. CP members persist CP state in these sub-directories. AP members persist only
their status as non-CP members.
If you have both CP and AP members in your cluster when CP Subsystem Persistence is enabled, and if you want to perform a cluster-wide restart, you need to ensure that AP members are also restarted with their CP persistence stores. |
To change the base directory, set a value in the base-dir
option.
CP Subsystem Persistence and AP Persistence
As well as CP Subsystem Persistence, Hazelcast offers Persistence for some AP data structures. If you persist AP and CP data structures in a single Hazelcast cluster, be aware that member or cluster restarts can fail because of either the persisted AP data structures or the CP data structures.
Choosing a Group Size
For all CP groups, you can set the number of CP members
that should participate in each one, using the group-size
option.
To scale out throughput and memory capacity, you can choose a CP group size that is smaller than the CP member count to distribute your CP data structures to multiple CP groups.
CP groups usually consist of an odd number of CP members between three and seven, i.e., it should be either three, five, or seven. An odd number of CP members in a group is more advantageous to an even number because of the quorum or majority calculations.
For a CP group of N
members:
-
the majority of members is calculated as
(N + 1) / 2
. -
the number of failing members you wish to tolerate is calculated as
(N - 1) / 2
.
For example, in a CP group of five CP members, operations are committed when they are replicated to at least three CP members. This CP group can tolerate the failure of two CP members and remain available.
We recommend that you create the minimum number of CP groups that you think are sufficient for your CP subsystem use case. Creating separate CP groups allows clients to distribute the load among the leaders of several CP groups. However, a large number of CP groups running on the same members may slow down the system. If you have accidentally or intentionally created a large number of CP groups, and don’t have plans to use them anymore, you should destroy them. |
Configuring Leadership Priority
Some CP members are better leadership candidates than others. For example, members in your primary data center make better leaders because of the reduced latency between clients and the leader. Whereas, members under high load are not good candidates because they are more likely to suffer from downtime. To ensure the availability of the CP subsystem, you can configure CP members with a priority rating, using the cp-member-priority
option.
Initially, any CP member with any priority can become a leader (cp-member-priority
is not therefore taken into account during the leader election processes). But, the backgound leadership rebalancing task periodically transfers the leadership from CP members with the lowest priorities to those with higher priorities within each CP group. Eventually, all CP group leaders are CP members with high priorities.
Only CP members that are part of a CP group can become leaders. CP members that are not in a CP group do not participate in the leadership rebalancing task. For example, you start a cluster with the CP group size set to three and three CP members with priority 0. Later, you promote a new CP member with priority 100. Even though it has a higher priority, the new CP member never becomes a leader because the CP group is full. The new CP member is eligible for leadership only when a CP member leaves the CP group and the new CP member takes the previous member’s place. |
Configuring CP Sessions
Sessions offer a trade-off between liveliness and safety. If you set a
small value for the session-time-to-live-seconds
option, a
session owner could be considered crashed very quickly and its resources can be
released prematurely. On the other hand, if you set a large value, a session
could be kept alive for an unnecessarily long duration even if its owner
actually crashes. However, it is a safer approach to not use a small session
session-time-to-live-seconds
duration. If a session owner is known to be crashed, its session could be closed manually.
Configuring Fenced Locks
By default, fenced locks are reentrant. When a caller acquires the lock, it can acquire the lock reentrantly as many times as it wants in a linearizable manner.
You can configure the reentrancy behavior in the lock-acquire-limit
option. For example,
reentrancy can be disabled by setting this option to 1
, making the lock a non-reentrant mutex. You can also set a custom reentrancy limit. When the reentrancy limit is already reached, the fenced lock does not block a lock call. Instead, it fails
with LockAcquireLimitReachedException
or a specified return value.
Configuring Semaphores
By default, a caller must acquire permits before releasing them and it cannot release a permit that it has not acquired. This means that you can acquire a permit from one thread and release it from another thread, using the same caller, but not different callers. In this mode, acquired permits are automatically released upon failure of the caller.
To enable a permit to be released without acquiring it first, enable JDK compatibility by setting the jdk-compatibility
option to true
. Because
acquired permits are not bound to threads.
When jdk-compatibility is set to true , Hazelcast does not
auto-cleanup acquired permits upon caller failures. If a permit holder fails, its permits must be released manually.
|
Removing Missing CP Members Automatically
If CP Subsystem Persistence is disabled, CP members lose their state after shutting down and so cannot rejoin the CP Subsystem. You can configure CP members to be automatically removed from the CP Subsystem after they shut down as well as how long to wait after they shut down before removing them.
By default, missing CP members are automatically removed
from the CP Subsystem after 4 hours and replaced with other
available CP members in all its CP groups. You can configure this time, using the missing-cp-member-auto-removal-seconds
option.
If a missing CP member rejoins the cluster after it is automatically removed from the CP Subsystem, that CP member must be terminated manually.
If no CP members are available to replace a missing CP member, the group size of any groups that it was in is reduced and the majority values are recalculated.
When CP Subsystem Persistence is enabled, CP members are not automatically removed from the CP Subsystem. These CP members can restore their CP state from disk and rejoin their CP groups. It is your responsibility to remove CP members if they do not restart. |
Handling Indeterminate Operation State
When you invoke an API method on a CP data structure, the method replicates an internal operation to the corresponding CP group. After the CP leader commits this operation to the majority of the CP group, it sends a response to the public API call. If a failure causes loss of the response, then the caller cannot determine if the operation is committed on the CP group or not.
You can handle loss of the response in two ways:
-
To allow CP leaders to replicate the operation to the CP group multiple times, set the
fail-on-indeterminate-operation-state
option tofalse
(default). -
To send an
IndeterminateOperationStateException
back to the caller, set thefail-on-indeterminate-operation-state
option totrue
.
Global Configuration Options
Use these configuration options to configure the CP Subsystem.
Option | Description | Default | Example | ||
---|---|---|---|---|---|
Number of CP members to initialize the CP Subsystem. If set, must be greater than or equal to |
|
||||
Number of CP members to particiate in each CP group. If set, this value must conform to the following rules:
- Must be
an odd number between |
Same as |
||||
Duration for a CP session to be kept alive after the last received heartbeat. A CP session is closed if no session heartbeat is received during this duration. Must be greater than |
|
||||
Interval in seconds for the periodically committed CP session heartbeats. Must be smaller than |
|
|
|||
Duration in seconds to wait before automatically removing a missing CP member from the CP Subsystem. Must be greater than or equal to A value of
|
|
|
|||
Whether CP Subsystem operations use
at-least-once and at-most-once execution guarantees. By default, operations use an at-least-once
execution guarantee. If set to |
|
|
|||
Whether CP Subsystem Persistence is globally enabled for CP groups created in the CP Subsystem. If enabled, CP members persist their local CP data to stable storage and can recover from crashes. |
|
||||
Parent directory where persisted CP data is stored. This directory is created automatically if it does not exist. This directory is shared among multiple CP members safely. This is especially useful for cloud environments where CP members generally use a shared filesystem. |
|
||||
Timeout duration in seconds for CP members to restore their persisted data from disk. A CP member fails its startup if it cannot complete its CP data restore process in the configured duration. |
|
Fenced Lock Options
Use the following options to configure fenced locks:
<hazelcast>
<cp-subsystem>
<locks>
<fenced-lock>
<!-- insert configuration options here -->
</fenced-lock>
</locks>
</cp-subsystem>
</hazelcast>
hazelcast:
cp-subsystem:
locks:
# insert configuration options here
Add configuration options to the FencedLockConfig
object.
Config config = new Config();
FencedLockConfig lockConfig = new FencedLockConfig(/*options*/);
config.getCPSubsystemConfig().addLockConfig(lockConfig);
Option | Description | Default | Example |
---|---|---|---|
The name of the fenced lock to configure. |
' ' (empty) |
|
|
Maximum number of reentrant lock acquisitions. Once a caller acquires the lock this many times, it will not be able to acquire the lock again, until it makes at least one |
|
|
Semaphore Options
Use the following options to configure semaphores:
<hazelcast>
<cp-subsystem>
< semaphores >
<semaphore>
<!-- insert configuration options here -->
</semaphore >
</semaphores >
</cp-subsystem>
</hazelcast>
hazelcast:
cp-subsystem:
semaphores:
# insert configuration options here
Add configuration options to the SemaphoreConfig
object.
Config config = new Config();
SemaphoreConfig semaphoreConfig = new SemaphoreConfig(/*insert configuration options here*/);
config.getCPSubsystemConfig().addSemaphoreConfig(semaphoreConfig);
Option | Description | Default | Example |
---|---|---|---|
Name of the semaphore. |
' ' (empty) |
|
|
Whether JDK compatibility is enabled. See Semaphores. |
|
|
|
Number of permits to initialize the semaphore. If a positive value is set, the semaphore is initialized with the given number of permits. |
|
|
Raft Algorithm Options
Use these options to fine-tune the Raft consensus algorithm.
Do not change these settings unless you know what you’re doing. |
<hazelcast>
<cp-subsystem>
<raft-algorithm>
<!-- insert configuration options here -->
</raft-algorithm>
</cp-subsystem>
</hazelcast>
hazelcast:
cp-subsystem:
raft-algorithm:
# insert configuration options here
Add configuration options to the RaftAlgorithmConfig
object.
Config config = new Config();
RaftAlgorithmConfig raftConfig = new RaftAlgorithmConfig();
config.getCPSubsystemConfig().setRaftAlgorithmConfig(raftConfig);
Option | Description | Default | Example |
---|---|---|---|
Leader election timeout in milliseconds. If a candidate cannot win the majority of the votes in time, a new election round is initiated. |
|
|
|
Duration in milliseconds for a Raft leader to send periodic heartbeat messages to its followers to indicate its liveliness. Periodic heartbeat messages are actually append entries requests and can contain log entries for the lagging followers. If a too small value is set, heartbeat messages are sent from Raft leaders to followers too frequently and it can cause an unnecessary usage of CPU and network bandwidth. |
|
|
|
Maximum number of missed Raft leader heartbeats for a follower to trigger a new leader election round. For example, if |
|
|
|
Maximum number of Raft log entries that can be sent as a batch in a single append entries request. In Hazelcast’s Raft consensus algorithm implementation, a Raft leader maintains a separate replication pipeline for each follower. It sends a new batch of Raft log entries to a follower after the follower acknowledges the last append entries request sent by the leader. |
|
|
|
Number of new commits to initiate a new snapshot after the last snapshot taken by the local Raft member. This value must be configured wisely as it effects performance of the system in multiple ways. If a small value is set, it means that snapshots are taken too frequently and Raft members keep a very short Raft log. If snapshots are large and CP Subsystem Persistence is enabled, this can create an unnecessary overhead on I/O performance. Moreover, a Raft leader can send too many snapshots to followers and this can create an unnecessary overhead on network. On the other hand, if a very large value is set, it can create a memory overhead since Raft log entries are going to be kept in memory until the next snapshot. |
|
|
|
Maximum number of
uncommitted log entries in the leader’s Raft log before temporarily rejecting
new requests of callers. Since Raft leaders send log entries to followers in
batches, they accumulate incoming requests in order to improve the throughput.
You can configure this field by considering your degree of concurrency in your
callers. For instance, if you have at most |
|
|
|
Timeout duration in milliseconds to apply backoff on append entries requests. After a Raft leader sends an append entries request to a follower, it will not send a subsequent append entries request either until the follower responds or this timeout occurs. Backoff durations are increased exponentially if followers remain unresponsive. |
|
|