Failure Detector Configuration
A failure detector is responsible to determine if a member in the cluster is unreachable or crashed. The most important problem in failure detection is to distinguish whether a member is still alive but slow or has crashed. But according to the famous FLP result, it is impossible to distinguish a crashed member from a slow one in an asynchronous system. A workaround to this limitation is to use unreliable failure detectors. An unreliable failure detector allows a member to suspect that others have failed, usually based on liveness criteria but it can make mistakes to a certain degree.
Hazelcast has the following built-in failure detectors: Deadline Failure Detector and Phi Accrual Failure Detector.
There is also a Ping Failure Detector, that, if enabled, works in parallel with the above ones, but identifies the failures on OSI Layer 3 (Network Layer). This detector is by default disabled.
Note that, Hazelcast also offers failure detectors for its Java client. See the Client Failure Detectors section for more information.
Deadline Failure Detector
Deadline Failure Detector uses an absolute timeout for missing/lost heartbeats. After timeout, a member is considered as crashed/unavailable and marked as suspected.
Deadline Failure Detector has the following configuration properties:
-
hazelcast.heartbeat.interval.seconds
: This is the interval at which member heartbeat messages are sent to each other. -
hazelcast.max.no.heartbeat.seconds
: This is the timeout which defines when a cluster member is suspected because it has not sent any heartbeats.
To use Deadline Failure Detector, the configuration property
hazelcast.heartbeat.failuredetector.type
should be set to "deadline"
.
Declarative Configuration:
<hazelcast>
...
<properties>
<property name="hazelcast.heartbeat.failuredetector.type">deadline</property>
<property name="hazelcast.heartbeat.interval.seconds">5</property>
<property name="hazelcast.max.no.heartbeat.seconds">120</property>
</properties>
...
</hazelcast>
hazelcast:
properties:
hazelcast.heartbeat.failuredetector.type: deadline
hazelcast.heartbeat.interval.seconds: 5
hazelcast.max.no.heartbeat.seconds: 120
Programmatic Configuration:
Config config = ...;
config.setProperty("hazelcast.heartbeat.failuredetector.type", "deadline");
config.setProperty("hazelcast.heartbeat.interval.seconds", "5");
config.setProperty("hazelcast.max.no.heartbeat.seconds", "120");
[...]
Deadline Failure Detector is the default failure detector in Hazelcast. |
Phi Accrual Failure Detector
This is the failure detector based on The Phi Accrual Failure Detector' by Hayashibara et al.
Phi Accrual Failure Detector keeps track of the intervals between heartbeats in a sliding window of time and measures the mean and variance of these samples and calculates a value of suspicion level (Phi). The value of phi increases when the period since the last heartbeat gets longer. If the network becomes slow or unreliable, the resulting mean and variance increase, there needs to be a longer period for which no heartbeat is received before the member is suspected.
The hazelcast.heartbeat.interval.seconds
and hazelcast.max.no.heartbeat.seconds
properties still can be used as period of heartbeat messages and deadline of
heartbeat messages. Since Phi Accrual Failure Detector is adaptive to network
conditions, a much lower hazelcast.max.no.heartbeat.seconds
can be defined than
Deadline Failure Detector's timeout.
In addition to the above two properties, Phi Accrual Failure Detector has the following configuration properties:
-
hazelcast.heartbeat.phiaccrual.failuredetector.threshold
: This is the phi threshold for suspicion. After calculated phi exceeds this threshold, a member is considered as unreachable and marked as suspected. A low threshold allows to detect member crashes/failures faster but can generate more mistakes and cause wrong member suspicions. A high threshold generates fewer mistakes but is slower to detect actual crashes/failures.phi = 1
means likeliness that we will make a mistake is about10%
. The likeliness is about1%
withphi = 2
,0.1%
withphi = 3
and so on. Default phi threshold is 10. -
hazelcast.heartbeat.phiaccrual.failuredetector.sample.size
: Number of samples to keep for history. Its default value is 200. -
hazelcast.heartbeat.phiaccrual.failuredetector.min.std.dev.millis
: Minimum standard deviation to use for the normal distribution used when calculating phi. Too low standard deviation might result in too much sensitivity.
To use Phi Accrual Failure Detector, configuration property
hazelcast.heartbeat.failuredetector.type
should be set to "phi-accrual"
.
Declarative Configuration:
<hazelcast>
...
<properties>
<property name="hazelcast.heartbeat.failuredetector.type">phi-accrual</property>
<property name="hazelcast.heartbeat.interval.seconds">1</property>
<property name="hazelcast.max.no.heartbeat.seconds">60</property>
<property name="hazelcast.heartbeat.phiaccrual.failuredetector.threshold">10</property>
<property name="hazelcast.heartbeat.phiaccrual.failuredetector.sample.size">200</property>
<property name="hazelcast.heartbeat.phiaccrual.failuredetector.min.std.dev.millis">100</property>
</properties>
...
</hazelcast>
hazelcast:
properties:
hazelcast.heartbeat.failuredetector.type: phi-accrual
hazelcast.heartbeat.interval.seconds: 1
hazelcast.max.no.heartbeat.seconds: 60
hazelcast.heartbeat.phiaccrual.failuredetector.sample.size: 200
hazelcast.heartbeat.phiaccrual.failuredetector.min.std.dev.millis: 100
Programmatic Configuration:
Config config = ...;
config.setProperty("hazelcast.heartbeat.failuredetector.type", "phi-accrual");
config.setProperty("hazelcast.heartbeat.interval.seconds", "1");
config.setProperty("hazelcast.max.no.heartbeat.seconds", "60");
config.setProperty("hazelcast.heartbeat.phiaccrual.failuredetector.threshold", "10");
config.setProperty("hazelcast.heartbeat.phiaccrual.failuredetector.sample.size", "200");
config.setProperty("hazelcast.heartbeat.phiaccrual.failuredetector.min.std.dev.millis", "100");
[...]
Ping Failure Detector
The Ping Failure Detector may be configured in addition to one of Deadline and Phi Accrual Failure Detectors. It operates at Layer 3 of the OSI protocol and provides much quicker and more deterministic detection of hardware and other lower level events. This detector may be configured to perform an extra check after a member is suspected by one of the other detectors, or it can work in parallel, which is the default. This way hardware and network level issues are detected more quickly.
This failure detector is based on InetAddress.isReachable()
.
When the JVM process has enough permissions to create RAW sockets, the
implementation chooses to rely on ICMP Echo requests. This is preferred.
If there are not enough permissions, it can be configured to fallback on attempting a TCP Echo on port 7. In the latter case, both a successful connection or an explicit rejection is treated as "Host is Reachable". Or, it can be forced to use only RAW sockets. This is not preferred as each call creates a heavy weight socket and moreover the Echo service is typically disabled.
For the Ping Failure Detector to rely only on ICMP Echo requests, there are some criteria that need to be met.
Requirements and Linux/Unix Configuration
-
Supported OS: as of Java 1.8 only Linux/Unix environments are supported. This detector relies on ICMP, i.e., the protocol behind the
ping
command. It tries to issue the ping attempts periodically, and their responses are used to determine the reachability of the remote member. However, you cannot simply create an ICMP Echo Request because these type of packets do not rely on any of the preexisting transport protocols such as TCP. In order to create such a request, you must have the privileges to create RAW sockets (see https://linux.die.net/man/7/raw). Most operating systems allow this to the root users, however Unix based ones are more flexible and allow the use of custom privileges per process instead of requiring root access. Therefore, this detector is supported only on Linux. -
The Java executable must have the
cap_net_raw
capability. As described in the above requirement, on Linux, you have the ability to define extra capabilities to a single process, which would allow the process to interact with the RAW sockets. This interaction is achieved via the capabilitycap_net_raw
(see https://linux.die.net/man/7/capabilities). To enable this capability run the following command:sudo setcap cap_net_raw=+ep <JDK_HOME>/jre/bin/java
-
When running with custom capabilities, the dynamic linker on Linux rejects loading the libs from untrusted paths. Since you have now
cap_net_raw
as a custom capability for a process, it becomes suspicious to the dynamic linker and throws an error:java: error while loading shared libraries: libjli.so: cannot open shared object file: No such file or directory
-
To overcome this rejection, the
<JDK_HOME>/jre/lib/amd64/jli/
path needs to be added in theld.conf
. Run the following command to do this:echo "<JDK_HOME>/jre/lib/amd64/jli/" >> /etc/ld.so.conf.d/java.conf && sudo ldconfig
-
-
ICMP Echo Requests must not be blocked by the receiving hosts.
/proc/sys/net/ipv4/icmp_echo_ignore_all
set to0
. Run the following command:echo 0 > /proc/sys/net/ipv4/icmp_echo_ignore_all
If any of the above criteria isn’t met, then the isReachable
always falls back on TCP Echo attempts on port 7.
To be able to use the Ping Failure Detector, please add the following properties in your Hazelcast declarative configuration file:
<hazelcast>
...
<properties>
<property name="hazelcast.icmp.enabled">true</property>
<property name="hazelcast.icmp.parallel.mode">true</property>
<property name="hazelcast.icmp.timeout">1000</property>
<property name="hazelcast.icmp.max.attempts">3</property>
<property name="hazelcast.icmp.interval">1000</property>
<property name="hazelcast.icmp.ttl">0</property>
</properties>
...
</hazelcast>
The following are the property descriptions:
-
hazelcast.icmp.enabled
: Enables legacy ICMP detection mode, works cooperatively with the existing failure detector and only kicks-in after a pre-defined period has passed with no heartbeats from a member. Its default value isfalse
. -
hazelcast.icmp.parallel.mode
: Enabling the parallel ping detector, works separately from the other detectors. Its default value istrue
. -
hazelcast.icmp.timeout
: Number of milliseconds until a ping attempt is considered failed if there was no reply. Its default value is1000
milliseconds. -
hazelcast.icmp.max.attempts
: Maximum number of ping attempts before the member/node gets suspected by the detector. Its default value is3
. -
hazelcast.icmp.interval
: Interval, in milliseconds, between each ping attempt. 1000ms (1 sec) is also the minimum interval allowed. Its default value is1000
milliseconds. -
hazelcast.icmp.ttl
: Maximum number of hops the packets should go through. Its default value is0
.
In the above configuration, the Ping detector attempts 3 pings, one every second and waits up to 1 second for each to complete. If after 3 seconds, there was no successful ping, the member gets suspected.
To enforce the Requirements, the property hazelcast.icmp.echo.fail.fast.on.startup
can also be set to true
, in which case, if any of the requirements
isn’t met, Hazelcast fails to start.
Below is a summary table of all possible configuration combinations of the ping failure detector.
ICMP | Parallel | Fail-Fast | Description | Linux | Windows | macOS |
---|---|---|---|---|---|---|
false |
false |
false |
Completely disabled |
N/A |
N/A |
N/A |
true |
false |
false |
Legacy ping mode. This works hand-to-hand with the OSI Layer 7 failure
detector (see. phi or deadline in the sections above). Ping in this mode
only kicks in after a period when there are no heartbeats received, in which
case the remote Hazelcast member is pinged up to a configurable count of attempts.
If all those attempts fail, the member gets suspected. You can configure this
attempt count using the |
Supported ICMP Echo if available - Falls back on TCP Echo on port 7 |
Supported TCP Echo on port 7 |
Supported ICMP Echo if available - Falls back on TCP Echo on port 7 |
true |
true |
false |
Parallel ping detector, works in parallel with the configured failure detector. Checks periodically if members are live (OSI Layer 3) and suspects them immediately, regardless of the other detectors. |
Supported ICMP Echo if available - Falls back on TCP Echo on port 7 |
Supported TCP Echo on port 7 |
Supported ICMP Echo if available - Falls back on TCP Echo on port 7 |
true |
true |
true |
Parallel ping detector, works in parallel with the configured failure detector. Checks periodically if members are live (OSI Layer 3) and suspects them immediately, regardless of the other detectors. |
Supported - Requires OS Configuration Enforcing ICMP Echo if available - No start up if not available |
Not Supported |
Not Supported - Requires root privileges |