Phi Accrual Failure Detector
This is the failure detector based on The Phi Accrual Failure Detector' by Hayashibara et al.
Phi Accrual Failure Detector keeps track of the intervals between heartbeats in a sliding window of time and measures the mean and variance of these samples and calculates a value of suspicion level (Phi). The value of phi increases when the period since the last heartbeat gets longer. If the network becomes slow or unreliable, the resulting mean and variance increase, there needs to be a longer period for which no heartbeat is received before the member is suspected.
properties still can be used as period of heartbeat messages and deadline of
heartbeat messages. Since Phi Accrual Failure Detector is adaptive to network
conditions, a much lower
hazelcast.max.no.heartbeat.seconds can be defined than
Deadline Failure Detector's timeout.
In addition to the above two properties, Phi Accrual Failure Detector has the following configuration properties:
hazelcast.heartbeat.phiaccrual.failuredetector.threshold: This is the phi threshold for suspicion. After calculated phi exceeds this threshold, a member is considered as unreachable and marked as suspected. A low threshold allows to detect member crashes/failures faster but can generate more mistakes and cause wrong member suspicions. A high threshold generates fewer mistakes but is slower to detect actual crashes/failures.
phi = 1means likeliness that we will make a mistake is about
10%. The likeliness is about
phi = 2,
phi = 3and so on. Default phi threshold is 10.
hazelcast.heartbeat.phiaccrual.failuredetector.sample.size: Number of samples to keep for history. Its default value is 200.
hazelcast.heartbeat.phiaccrual.failuredetector.min.std.dev.millis: Minimum standard deviation to use for the normal distribution used when calculating phi. Too low standard deviation might result in too much sensitivity.
To use Phi Accrual Failure Detector, configuration property
hazelcast.heartbeat.failuredetector.type should be set to
<hazelcast> ... <properties> <property name="hazelcast.heartbeat.failuredetector.type">phi-accrual</property> <property name="hazelcast.heartbeat.interval.seconds">1</property> <property name="hazelcast.max.no.heartbeat.seconds">60</property> <property name="hazelcast.heartbeat.phiaccrual.failuredetector.threshold">10</property> <property name="hazelcast.heartbeat.phiaccrual.failuredetector.sample.size">200</property> <property name="hazelcast.heartbeat.phiaccrual.failuredetector.min.std.dev.millis">100</property> </properties> ... </hazelcast>
hazelcast: properties: hazelcast.heartbeat.failuredetector.type: phi-accrual hazelcast.heartbeat.interval.seconds: 1 hazelcast.max.no.heartbeat.seconds: 60 hazelcast.heartbeat.phiaccrual.failuredetector.sample.size: 200 hazelcast.heartbeat.phiaccrual.failuredetector.min.std.dev.millis: 100
Config config = ...; config.setProperty("hazelcast.heartbeat.failuredetector.type", "phi-accrual"); config.setProperty("hazelcast.heartbeat.interval.seconds", "1"); config.setProperty("hazelcast.max.no.heartbeat.seconds", "60"); config.setProperty("hazelcast.heartbeat.phiaccrual.failuredetector.threshold", "10"); config.setProperty("hazelcast.heartbeat.phiaccrual.failuredetector.sample.size", "200"); config.setProperty("hazelcast.heartbeat.phiaccrual.failuredetector.min.std.dev.millis", "100"); [...]