Split-Brain Protection
Split-brain protection mechanism provided in Hazelcast protects your cluster in case the number of cluster members drops below the specified one. How to respond to a split-brain scenario depends on whether consistency of data or availability of your application is of primary concern. In either case, because a split-brain scenario is caused by a network failure, you must initiate an effort to identify and correct the network failure. Your cluster cannot be brought back to steady state operation until the underlying network failure is fixed. If consistency is your primary concern, you can use Hazelcast’s split-brain protection feature.
This feature enables you to specify
the minimum cluster size required for operations to occur.
This is achieved by defining and configuring a minimum-cluster-size
for the cluster.
If the cluster size is below this minimum value, the operations are rejected and
the rejected operations return a SplitBrainProtectionException
to their callers.
Additionally, it is possible to configure this size with a user-defined
SplitBrainProtectionFunction
which is consulted to determine there is no split-brain on
each cluster membership change.
Your application continues its operations on the remaining operating cluster. Any application instances connected to the cluster with sizes below the minimum threshold defined by the split-brain protection configuration receive exceptions which, depending on the programming and monitoring setup, should generate alerts. The key point is that rather than applications continuing in error with stale data, they are prevented from doing so.
Split-brain protection is supported for the following Hazelcast data structures:
-
IMap (for Hazelcast 3.5 and higher versions)
-
Transactional Map (for Hazelcast 3.5 and higher versions)
-
ICache (for Hazelcast 3.5 and higher versions)
-
ILock (for Hazelcast 3.8 and higher versions)
-
IQueue (for Hazelcast 3.8 and higher versions)
-
IExecutorService, DurableExecutorService, IScheduledExecutorService, MultiMap, ISet, IList, Ringbuffer, Replicated Map, Cardinality Estimator, IAtomicLong, IAtomicReference, ISemaphore, ICountdownLatch (for Hazelcast 3.10 and higher versions)
Each data structure to be protected should have the configuration added to it as explained in the Configuring Split-Brain Protection section.
Time Window for Split-Brain Protection
Cluster membership is established and maintained by heartbeats. A network partitioning presents some members as being unreachable. While configurable, it is normally seconds or tens of seconds before the cluster is adjusted to exclude unreachable members. The cluster size is based on the currently understood number of members.
For this reason, there will be a time window between the network partitioning and the application of split-brain protection. Length of this window depends on the failure detector. Given guarantee is, every member eventually detects the failed members and rejects the operation on the data structure which requires the split-brain protection.
Split-brain protection can be
configured with out-of-the-box SplitBrainProtectionFunction
s which
determine whether there is a split-brain situation independent of the
cluster membership manager. These functions take advantage of the heartbeat
and other failure-detector information configured on
the Hazelcast members.
For more information, see the Consistency and Replication Model chapter.
Configuring Split-Brain Protection
You can set up the split-brain protection configuration using either declarative or programmatic mechanism.
Assume that you have a 7-member Hazelcast Cluster and you want to set the minimum number of four members for the cluster to continue operating. In this case, if a split-brain happens, the sub-clusters of sizes 1, 2 and 3 are prevented from being used. Only the sub-cluster of four members is allowed to be used.
It is preferable to have an odd-sized initial cluster size to prevent a single network partitioning (split-brain) from creating two equal sized clusters. |
Member Count Split-Brain Protection
This type of split-brain protection function determines the presence of split-brain protection based on the count of members in the cluster, as observed by the local member’s cluster membership manager and is available since Hazelcast 3.5. The following are map configurations for the example 7-member cluster scenario described above:
<hazelcast>
...
<split-brain-protection name="splitBrainProtectionRuleWithFourMembers" enabled="true">
<minimum-cluster-size>4</minimum-cluster-size>
</split-brain-protection>
<map name="default">
<split-brain-protection-ref>splitBrainProtectionRuleWithFourMembers</split-brain-protection-ref>
</map>
...
</hazelcast>
hazelcast:
split-brain-protection:
splitBrainProtectionRuleWithFourMembers:
enabled: true
minimum-cluster-size: 4
map:
default:
split-brain-protection-ref: splitBrainProtectionRuleWithFourMembers
SplitBrainProtectionConfig splitBrainProtectionConfig = new SplitBrainProtectionConfig();
splitBrainProtectionConfig.setName("splitBrainProtectionRuleWithFourMembers")
.setEnabled(true)
.setMinimumClusterSize(4);
MapConfig mapConfig = new MapConfig();
mapConfig.setSplitBrainProtectionName("splitBrainProtectionRuleWithFourMembers");
Config config = new Config();
config.addSplitBrainProtectionConfig(splitBrainProtectionConfig);
config.addMapConfig(mapConfig);
Probabilistic Split-Brain Protection Function
The probabilistic split-brain protection function uses a private instance of Phi Accrual Cluster Failure Detector which is updated with member heartbeats and its parameters can be fine-tuned to determine the count of live members in the cluster, independently of the cluster’s membership manager.
This function has the following configuration elements:
-
acceptable-heartbeat-pause-millis
: Duration in milliseconds corresponding to the number of potentially lost/delayed heartbeats that are accepted before considering it to be an anomaly. This margin is important to be able to survive sudden, occasional, pauses in heartbeat arrivals, due to for example garbage collection or network drops. The value must be in the [heartbeat interval , maximum no heartbeat interval] range, otherwise Hazelcast does not start. Its default value is60000
milliseconds. -
suspicion-threshold
: Threshold for suspicion (φ) level. A low threshold is prone to generate many wrong suspicions but ensures a quick detection in the event of a real crash. Conversely, a high threshold generates fewer mistakes but needs more time to detect actual crashes. Its default value is10
. -
max-sample-size
: Number of samples to use for calculation of mean and standard deviation of inter-arrival times. Its default value is200
. -
heartbeat-interval-millis
: Bootstrap the stats with heartbeats that corresponds to this duration in milliseconds, with a rather high standard deviation (since environment is unknown in the beginning). Its default value is5000
milliseconds. -
min-std-deviation-millis
: Minimum standard deviation (in milliseconds) to use for the normal distribution used when calculating phi. Too low standard deviation might result in too much sensitivity for sudden, but normal, deviations in heartbeat inter arrival times. Its default value is100
milliseconds.
<hazelcast>
...
<split-brain-protection enabled="true" name="probabilistic-split-brain-protection">
<minimum-cluster-size>3</minimum-cluster-size>
<protect-on>READ_WRITE</protect-on>
<probabilistic-split-brain-protection acceptable-heartbeat-pause-millis="5000"
max-sample-size="500" suspicion-threshold="10" />
</split-brain-protection>
<set name="split-brain-protected-set">
<split-brain-protection-ref>probabilistic-split-brain-protection</split-brain-protection-ref>
</set>
...
</hazelcast>
hazelcast:
split-brain-protection:
probabilistic-split-brain-protection:
enabled: true
minimum-cluster-size: 3
protect-on: READ_WRITE
probabilistic-split-brain-protection:
acceptable-heartbeat-pause-millis: 5000
max-sample-size: 500
suspicion-threshold: 10
set:
split-brain-protected-set:
split-brain-protection-ref: probabilistic-split-brain-protection
SplitBrainProtectionConfig splitBrainProtectionConfig =
SplitBrainProtectionConfig.newProbabilisticSplitBrainProtectionConfigBuilder("probabilist-splitBrainProtection", 3)
.withAcceptableHeartbeatPauseMillis(5000)
.withMaxSampleSize(500)
.withSuspicionThreshold(10)
.build();
splitBrainProtectionConfig.setProtectOn(SplitBrainProtectionOn.READ_WRITE);
SetConfig setConfig = new SetConfig("split-brain-protected-set");
setConfig.setSplitBrainProtectionName("probabilist-splitBrainProtection");
Config config = new Config();
config.addSplitBrainProtectionConfig(splitBrainProtectionConfig);
config.addSetConfig(setConfig);
Recently-Active Split-Brain Protection Function
This function can be used to implement a more conservative split-brain protection by requiring that a heartbeat has been received from each member within a configurable time window since now.
<hazelcast>
...
<split-brain-protection enabled="true" name="recently-active-split-brain-protection">
<minimum-cluster-size>4</minimum-cluster-size>
<protect-on>READ_WRITE</protect-on>
<recently-active-split-brain-protection heartbeat-tolerance-millis="60000" />
</split-brain-protection>
<set name="split-brain-protected-set">
<split-brain-protection-ref>recently-active-split-brain-protection</split-brain-protection-ref>
</set>
...
</hazelcast>
hazelcast:
split-brain-protection:
recently-active-split-brain-protection:
enabled: true
minimum-cluster-size: 4
protect-on: READ_WRITE
recently-active-split-brain-protection:
heartbeat-tolerance-millis: 60000
set:
split-brain-protected-set:
split-brain-protection-ref: recently-active-split-brain-protection
SplitBrainProtectionConfig splitBrainProtectionConfig =
SplitBrainProtectionConfig.newRecentlyActiveSplitBrainProtectionConfigBuilder("recently-active-splitBrainProtection", 4, 60000)
.build();
splitBrainProtectionConfig.setProtectOn(SplitBrainProtectionOn.READ_WRITE);
SetConfig setConfig = new SetConfig("split-brain-protected-set");
setConfig.setSplitBrainProtectionName("recently-active-splitBrainProtection");
Config config = new Config();
config.addSplitBrainProtectionConfig(splitBrainProtectionConfig);
config.addSetConfig(setConfig);
Split-Brain Protection Configuration Reference
The split-brain protection configuration has the following elements:
-
minimum-cluster-size
: Minimum number of members required in a cluster for the cluster to remain in an operational state. If the number of members is below the defined minimum at any time, the operations are rejected and the rejected operations return aSplitBrainProtectionException
to their callers. -
protect-on
: Type of the cluster split-brain protection. Available values are READ, WRITE and READ_WRITE. -
split-brain-protection-function-class-name
: Class name of aSplitBrainProtectionFunction
implementation, allows to configure split-brain protection with a custom split-brain protection function. It cannot be used in conjunction withprobabilistic-split-brain-protection
orrecently-active-split-brain-protection
. -
split-brain-protection-listeners
: Declaration of split-brain protection listeners which are notified on split-brain protection status changes. -
probabilistic-split-brain-protection
: Configures the split-brain protection with a probabilistic protection function. It cannot be used in conjunction withsplit-brain-protection-function-class-name
orrecently-active-split-brain-protection
. -
recently-active-split-brain-protection
: Configures the split-brain protection with a recently-active protection function. It cannot be used in conjunction withsplit-brain-protection-function-class-name
orprobabilistic-split-brain-protection
.
Example configuration with custom SplitBrainProtectionFunction implementation
package my.domain;
public class CustomSplitBrainProtectionFunction implements SplitBrainProtectionFunction {
@Override
public boolean apply(Collection<Member> members) {
// implement split-brain detection logic here
}
}
<hazelcast>
...
<split-brain-protection enabled="true" name="member-count-split-brain-protection">
<protect-on>READ_WRITE</protect-on>
<minimum-cluster-size>3</minimum-cluster-size>
<split-brain-protection-function-class-name>my.domain.CustomSplitBrainProtectionFunction</split-brain-protection-function-class-name>
</split-brain-protection>
...
</hazelcast>
hazelcast:
split-brain-protection:
member-count-split-brain-protection:
enabled: true
protect-on: READ_WRITE
minimum-cluster-size: 3
split-brain-protection-function-class-name: my.domain.CustomSplitBrainProtectionFunction
Configuring Split-Brain Protection Listeners
You can register listeners to be notified about the split-brain protection results. Split-brain protection listeners are local to the member where they are registered, so they receive only events that occurred on that local member.
These listeners can be configured via declarative or programmatic configuration. The following examples are such configurations.
<hazelcast>
...
<split-brain-protection name="splitBrainProtectionRuleWithFourMembers" enabled="true">
<minimum-cluster-size>4</minimum-cluster-size>
<split-brain-protection-listeners>
<split-brain-protection-listener>
com.company.splitbrainprotection.FourMemberSplitBrainProtectionListener
</split-brain-protection-listener>
</split-brain-protection-listeners>
</split-brain-protection>
<map name="default">
<split-brain-protection-ref>splitBrainProtectionRuleWithFourMembers</split-brain-protection-ref>
</map>
...
</hazelcast>
hazelcast:
split-brain-protection:
splitBrainProtectionRuleWithFourMembers:
enabled: true
minimum-cluster-size: 4
split-brain-protection-listener: com.company.splitbrainprotection.FourMemberSplitBrainProtectionListener
map:
default:
split-brain-protection-ref: splitBrainProtectionRuleWithFourMembers
SplitBrainProtectionListenerConfig listenerConfig = new SplitBrainProtectionListenerConfig();
// You can either directly set SplitBrainProtection listener implementation of your own
listenerConfig.setImplementation(new SplitBrainProtectionListener() {
@Override
public void onChange(SplitBrainProtectionEvent splitBrainProtectionEvent) {
if (splitBrainProtectionEvent.isPresent()) {
// handle SplitBrainProtection presence
} else {
// handle SplitBrainProtection absence
}
}
});
// Or you can give the name of the class that implements SplitBrainProtectionListener interface.
listenerConfig.setClassName("com.company.splitBrainProtection.ThreeMemberSplitBrainProtectionListener");
SplitBrainProtectionConfig splitBrainProtectionConfig = new SplitBrainProtectionConfig();
splitBrainProtectionConfig.setName("splitBrainProtectionRuleWithFourMembers")
.setEnabled(true)
.setMinimumClusterSize(4)
.addListenerConfig(listenerConfig);
MapConfig mapConfig = new MapConfig();
mapConfig.setSplitBrainProtectionName("splitBrainProtectionRuleWithFourMembers");
Config config = new Config();
config.addSplitBrainProtectionConfig(splitBrainProtectionConfig);
config.addMapConfig(mapConfig);
Querying Split-Brain Protection Results
Split-brain protection service gives you the ability to
query split-brain protection results over the SplitBrainProtection
instances.
These instances let you query the result of a particular split-brain protection.
The following is a SplitBrainProtection
interface that you can interact with.
/**
* {@link SplitBrainProtection} provides access to the current status of a split-brain protection.
*/
public interface SplitBrainProtection {
/**
* Returns true if the minimum cluster size is satisfied, otherwise false.
*
* @return boolean whether the minimum cluster size property is satisfied
*/
boolean hasMinimumSize();
}
You can retrieve the SplitBrainProtection
instance as in the following example.
String splitBrainProtectionName = "at-least-one-storage-member";
SplitBrainProtectionConfig splitBrainProtectionConfig = new SplitBrainProtectionConfig();
splitBrainProtectionConfig.setName(splitBrainProtectionName);
splitBrainProtectionConfig.setEnabled(true);
MapConfig mapConfig = new MapConfig();
mapConfig.setSplitBrainProtectionName(splitBrainProtectionName);
Config config = new Config();
config.addSplitBrainProtectionConfig(splitBrainProtectionConfig);
config.addMapConfig(mapConfig);
HazelcastInstance hazelcastInstance = Hazelcast.newHazelcastInstance(config);
SplitBrainProtectionService splitBrainProtectionService = hazelcastInstance.getSplitBrainProtectionService();
SplitBrainProtection splitBrainProtection = splitBrainProtectionService.getSplitBrainProtection(splitBrainProtectionName);
boolean splitBrainProtectionPresence = splitBrainProtection.hasMinimumSize();