Split-Brain Protection

Split-brain protection mechanism provided in Hazelcast protects your cluster in case the number of cluster members drops below the specified one. How to respond to a split-brain scenario depends on whether consistency of data or availability of your application is of primary concern. In either case, because a split-brain scenario is caused by a network failure, you must initiate an effort to identify and correct the network failure. Your cluster cannot be brought back to steady state operation until the underlying network failure is fixed. If consistency is your primary concern, you can use Hazelcast’s split-brain protection feature.

This feature enables you to specify the minimum cluster size required for operations to occur. This is achieved by defining and configuring a minimum-cluster-size for the cluster. If the cluster size is below this minimum value, the operations are rejected and the rejected operations return a SplitBrainProtectionException to their callers. Additionally, it is possible to configure this size with a user-defined SplitBrainProtectionFunction which is consulted to determine there is no split-brain on each cluster membership change.

Your application continues its operations on the remaining operating cluster. Any application instances connected to the cluster with sizes below the minimum threshold defined by the split-brain protection configuration receive exceptions which, depending on the programming and monitoring setup, should generate alerts. The key point is that rather than applications continuing in error with stale data, they are prevented from doing so.

Split-brain protection is supported for the following Hazelcast data structures:

  • IMap (for Hazelcast 3.5 and higher versions)

  • Transactional Map (for Hazelcast 3.5 and higher versions)

  • ICache (for Hazelcast 3.5 and higher versions)

  • ILock (for Hazelcast 3.8 and higher versions)

  • IQueue (for Hazelcast 3.8 and higher versions)

  • IExecutorService, DurableExecutorService, IScheduledExecutorService, MultiMap, ISet, IList, Ringbuffer, Replicated Map, Cardinality Estimator, IAtomicLong, IAtomicReference, ISemaphore, ICountdownLatch (for Hazelcast 3.10 and higher versions)

Each data structure to be protected should have the configuration added to it as explained in the Configuring Split-Brain Protection section.

Time Window for Split-Brain Protection

Cluster membership is established and maintained by heartbeats. A network partitioning presents some members as being unreachable. While configurable, it is normally seconds or tens of seconds before the cluster is adjusted to exclude unreachable members. The cluster size is based on the currently understood number of members.

For this reason, there will be a time window between the network partitioning and the application of split-brain protection. Length of this window depends on the failure detector. Given guarantee is, every member eventually detects the failed members and rejects the operation on the data structure which requires the split-brain protection.

Split-brain protection can be configured with out-of-the-box SplitBrainProtectionFunctions which determine whether there is a split-brain situation independent of the cluster membership manager. These functions take advantage of the heartbeat and other failure-detector information configured on the Hazelcast members.

For more information, see the Consistency and Replication Model chapter.

Configuring Split-Brain Protection

You can set up the split-brain protection configuration using either declarative or programmatic mechanism.

Assume that you have a 7-member Hazelcast Cluster and you want to set the minimum number of four members for the cluster to continue operating. In this case, if a split-brain happens, the sub-clusters of sizes 1, 2 and 3 are prevented from being used. Only the sub-cluster of four members is allowed to be used.

It is preferable to have an odd-sized initial cluster size to prevent a single network partitioning (split-brain) from creating two equal sized clusters.

Member Count Split-Brain Protection

This type of split-brain protection function determines the presence of split-brain protection based on the count of members in the cluster, as observed by the local member’s cluster membership manager and is available since Hazelcast 3.5. The following are map configurations for the example 7-member cluster scenario described above:

  • XML

  • YAML

  • Java

<hazelcast>
    ...
    <split-brain-protection name="splitBrainProtectionRuleWithFourMembers" enabled="true">
        <minimum-cluster-size>4</minimum-cluster-size>
    </split-brain-protection>
    <map name="default">
        <split-brain-protection-ref>splitBrainProtectionRuleWithFourMembers</split-brain-protection-ref>
    </map>
    ...
</hazelcast>
hazelcast:
  split-brain-protection:
    splitBrainProtectionRuleWithFourMembers:
      enabled: true
      minimum-cluster-size: 4
  map:
    default:
      split-brain-protection-ref: splitBrainProtectionRuleWithFourMembers
        SplitBrainProtectionConfig splitBrainProtectionConfig = new SplitBrainProtectionConfig();
        splitBrainProtectionConfig.setName("splitBrainProtectionRuleWithFourMembers")
					    .setEnabled(true)
					    .setMinimumClusterSize(4);

        MapConfig mapConfig = new MapConfig();
        mapConfig.setSplitBrainProtectionName("splitBrainProtectionRuleWithFourMembers");

        Config config = new Config();
        config.addSplitBrainProtectionConfig(splitBrainProtectionConfig);
        config.addMapConfig(mapConfig);

Probabilistic Split-Brain Protection Function

The probabilistic split-brain protection function uses a private instance of Phi Accrual Cluster Failure Detector which is updated with member heartbeats and its parameters can be fine-tuned to determine the count of live members in the cluster, independently of the cluster’s membership manager.

This function has the following configuration elements:

  • acceptable-heartbeat-pause-millis: Duration in milliseconds corresponding to the number of potentially lost/delayed heartbeats that are accepted before considering it to be an anomaly. This margin is important to be able to survive sudden, occasional, pauses in heartbeat arrivals, due to for example garbage collection or network drops. The value must be in the [heartbeat interval , maximum no heartbeat interval] range, otherwise Hazelcast does not start. Its default value is 60000 milliseconds.

  • suspicion-threshold: Threshold for suspicion (φ) level. A low threshold is prone to generate many wrong suspicions but ensures a quick detection in the event of a real crash. Conversely, a high threshold generates fewer mistakes but needs more time to detect actual crashes. Its default value is 10.

  • max-sample-size: Number of samples to use for calculation of mean and standard deviation of inter-arrival times. Its default value is 200.

  • heartbeat-interval-millis: Bootstrap the stats with heartbeats that corresponds to this duration in milliseconds, with a rather high standard deviation (since environment is unknown in the beginning). Its default value is 5000 milliseconds.

  • min-std-deviation-millis: Minimum standard deviation (in milliseconds) to use for the normal distribution used when calculating phi. Too low standard deviation might result in too much sensitivity for sudden, but normal, deviations in heartbeat inter arrival times. Its default value is 100 milliseconds.

  • XML

  • YAML

  • Java

<hazelcast>
    ...
    <split-brain-protection enabled="true" name="probabilistic-split-brain-protection">
        <minimum-cluster-size>3</minimum-cluster-size>
        <protect-on>READ_WRITE</protect-on>
        <probabilistic-split-brain-protection acceptable-heartbeat-pause-millis="5000"
                max-sample-size="500" suspicion-threshold="10" />
    </split-brain-protection>
    <set name="split-brain-protected-set">
        <split-brain-protection-ref>probabilistic-split-brain-protection</split-brain-protection-ref>
    </set>
    ...
</hazelcast>
hazelcast:
  split-brain-protection:
    probabilistic-split-brain-protection:
      enabled: true
      minimum-cluster-size: 3
      protect-on: READ_WRITE
      probabilistic-split-brain-protection:
        acceptable-heartbeat-pause-millis: 5000
        max-sample-size: 500
        suspicion-threshold: 10
  set:
    split-brain-protected-set:
      split-brain-protection-ref: probabilistic-split-brain-protection
        SplitBrainProtectionConfig splitBrainProtectionConfig =
                SplitBrainProtectionConfig.newProbabilisticSplitBrainProtectionConfigBuilder("probabilist-splitBrainProtection", 3)
                        .withAcceptableHeartbeatPauseMillis(5000)
                        .withMaxSampleSize(500)
                        .withSuspicionThreshold(10)
                        .build();
        splitBrainProtectionConfig.setProtectOn(SplitBrainProtectionOn.READ_WRITE);
        SetConfig setConfig = new SetConfig("split-brain-protected-set");
        setConfig.setSplitBrainProtectionName("probabilist-splitBrainProtection");
        Config config = new Config();
        config.addSplitBrainProtectionConfig(splitBrainProtectionConfig);
        config.addSetConfig(setConfig);

Recently-Active Split-Brain Protection Function

This function can be used to implement a more conservative split-brain protection by requiring that a heartbeat has been received from each member within a configurable time window since now.

  • XML

  • YAML

  • Java

<hazelcast>
    ...
    <split-brain-protection enabled="true" name="recently-active-split-brain-protection">
        <minimum-cluster-size>4</minimum-cluster-size>
        <protect-on>READ_WRITE</protect-on>
        <recently-active-split-brain-protection heartbeat-tolerance-millis="60000" />
    </split-brain-protection>
    <set name="split-brain-protected-set">
        <split-brain-protection-ref>recently-active-split-brain-protection</split-brain-protection-ref>
    </set>
    ...
</hazelcast>
hazelcast:
  split-brain-protection:
    recently-active-split-brain-protection:
      enabled: true
      minimum-cluster-size: 4
      protect-on: READ_WRITE
      recently-active-split-brain-protection:
        heartbeat-tolerance-millis: 60000
  set:
    split-brain-protected-set:
      split-brain-protection-ref: recently-active-split-brain-protection
        SplitBrainProtectionConfig splitBrainProtectionConfig =
                SplitBrainProtectionConfig.newRecentlyActiveSplitBrainProtectionConfigBuilder("recently-active-splitBrainProtection", 4, 60000)
                        .build();
        splitBrainProtectionConfig.setProtectOn(SplitBrainProtectionOn.READ_WRITE);
        SetConfig setConfig = new SetConfig("split-brain-protected-set");
        setConfig.setSplitBrainProtectionName("recently-active-splitBrainProtection");
        Config config = new Config();
        config.addSplitBrainProtectionConfig(splitBrainProtectionConfig);
        config.addSetConfig(setConfig);

Split-Brain Protection Configuration Reference

The split-brain protection configuration has the following elements:

  • minimum-cluster-size: Minimum number of members required in a cluster for the cluster to remain in an operational state. If the number of members is below the defined minimum at any time, the operations are rejected and the rejected operations return a SplitBrainProtectionException to their callers.

  • protect-on: Type of the cluster split-brain protection. Available values are READ, WRITE and READ_WRITE.

  • split-brain-protection-function-class-name: Class name of a SplitBrainProtectionFunction implementation, allows to configure split-brain protection with a custom split-brain protection function. It cannot be used in conjunction with probabilistic-split-brain-protection or recently-active-split-brain-protection.

  • split-brain-protection-listeners: Declaration of split-brain protection listeners which are notified on split-brain protection status changes.

  • probabilistic-split-brain-protection: Configures the split-brain protection with a probabilistic protection function. It cannot be used in conjunction with split-brain-protection-function-class-name or recently-active-split-brain-protection.

  • recently-active-split-brain-protection: Configures the split-brain protection with a recently-active protection function. It cannot be used in conjunction with split-brain-protection-function-class-name or probabilistic-split-brain-protection.

Example configuration with custom SplitBrainProtectionFunction implementation

package my.domain;

public class CustomSplitBrainProtectionFunction implements SplitBrainProtectionFunction {
        @Override
        public boolean apply(Collection<Member> members) {
            // implement split-brain detection logic here
        }
    }
  • XML

  • YAML

<hazelcast>
    ...
    <split-brain-protection enabled="true" name="member-count-split-brain-protection">
        <protect-on>READ_WRITE</protect-on>
        <minimum-cluster-size>3</minimum-cluster-size>
        <split-brain-protection-function-class-name>my.domain.CustomSplitBrainProtectionFunction</split-brain-protection-function-class-name>
    </split-brain-protection>
    ...
</hazelcast>
hazelcast:
  split-brain-protection:
    member-count-split-brain-protection:
      enabled: true
      protect-on: READ_WRITE
      minimum-cluster-size: 3
      split-brain-protection-function-class-name: my.domain.CustomSplitBrainProtectionFunction

Configuring Split-Brain Protection Listeners

You can register listeners to be notified about the split-brain protection results. Split-brain protection listeners are local to the member where they are registered, so they receive only events that occurred on that local member.

These listeners can be configured via declarative or programmatic configuration. The following examples are such configurations.

  • XML

  • YAML

  • Java

<hazelcast>
    ...
    <split-brain-protection name="splitBrainProtectionRuleWithFourMembers" enabled="true">
        <minimum-cluster-size>4</minimum-cluster-size>
        <split-brain-protection-listeners>
            <split-brain-protection-listener>
               com.company.splitbrainprotection.FourMemberSplitBrainProtectionListener
            </split-brain-protection-listener>
        </split-brain-protection-listeners>
    </split-brain-protection>
    <map name="default">
        <split-brain-protection-ref>splitBrainProtectionRuleWithFourMembers</split-brain-protection-ref>
    </map>
    ...
</hazelcast>
hazelcast:
  split-brain-protection:
    splitBrainProtectionRuleWithFourMembers:
      enabled: true
      minimum-cluster-size: 4
      split-brain-protection-listener: com.company.splitbrainprotection.FourMemberSplitBrainProtectionListener
  map:
    default:
      split-brain-protection-ref: splitBrainProtectionRuleWithFourMembers
        SplitBrainProtectionListenerConfig listenerConfig = new SplitBrainProtectionListenerConfig();
        // You can either directly set SplitBrainProtection listener implementation of your own
        listenerConfig.setImplementation(new SplitBrainProtectionListener() {
            @Override
            public void onChange(SplitBrainProtectionEvent splitBrainProtectionEvent) {
                if (splitBrainProtectionEvent.isPresent()) {
                    // handle SplitBrainProtection presence
                } else {
                    // handle SplitBrainProtection absence
                }
            }
        });
        // Or you can give the name of the class that implements SplitBrainProtectionListener interface.
        listenerConfig.setClassName("com.company.splitBrainProtection.ThreeMemberSplitBrainProtectionListener");

        SplitBrainProtectionConfig splitBrainProtectionConfig = new SplitBrainProtectionConfig();
        splitBrainProtectionConfig.setName("splitBrainProtectionRuleWithFourMembers")
        					    .setEnabled(true)
        					    .setMinimumClusterSize(4)
        					    .addListenerConfig(listenerConfig);


        MapConfig mapConfig = new MapConfig();
        mapConfig.setSplitBrainProtectionName("splitBrainProtectionRuleWithFourMembers");

        Config config = new Config();
        config.addSplitBrainProtectionConfig(splitBrainProtectionConfig);
        config.addMapConfig(mapConfig);

Querying Split-Brain Protection Results

Split-brain protection service gives you the ability to query split-brain protection results over the SplitBrainProtection instances. These instances let you query the result of a particular split-brain protection.

The following is a SplitBrainProtection interface that you can interact with.

/**
 * {@link SplitBrainProtection} provides access to the current status of a split-brain protection.
 */
public interface SplitBrainProtection {
    /**
     * Returns true if the minimum cluster size is satisfied, otherwise false.
     *
     * @return boolean whether the minimum cluster size property is satisfied
     */
    boolean hasMinimumSize();
}

You can retrieve the SplitBrainProtection instance as in the following example.

        String splitBrainProtectionName = "at-least-one-storage-member";
        SplitBrainProtectionConfig splitBrainProtectionConfig = new SplitBrainProtectionConfig();
        splitBrainProtectionConfig.setName(splitBrainProtectionName);
        splitBrainProtectionConfig.setEnabled(true);

        MapConfig mapConfig = new MapConfig();
        mapConfig.setSplitBrainProtectionName(splitBrainProtectionName);

        Config config = new Config();
        config.addSplitBrainProtectionConfig(splitBrainProtectionConfig);
        config.addMapConfig(mapConfig);

        HazelcastInstance hazelcastInstance = Hazelcast.newHazelcastInstance(config);
        SplitBrainProtectionService splitBrainProtectionService = hazelcastInstance.getSplitBrainProtectionService();
        SplitBrainProtection splitBrainProtection = splitBrainProtectionService.getSplitBrainProtection(splitBrainProtectionName);

        boolean splitBrainProtectionPresence = splitBrainProtection.hasMinimumSize();