Health Check and Monitoring
Hazelcast provides the HTTP-based Health Check endpoint, Health Check script and Health Monitoring utility.
To be able to benefit from the Health Check endpoint and script, you must enable the Health Check using either one of the following configuration options:
Using the network
configuration element:
<hazelcast>
...
<network>
<rest-api enabled="true">
<endpoint-group name=“HEALTH_CHECK” enabled=“true”/>
</rest-api>
</network>
...
</hazelcast>
hazelcast:
network:
rest-api:
enabled: true
endpoint-groups:
HEALTH_CHECK:
enabled: true
Using the advanced-network
configuration element:
<hazelcast>
...
<advanced-network>
<rest-server-socket-endpoint-config>
<endpoint-groups>
<endpoint-group name=“HEALTH_CHECK” enabled=“true”/>
</endpoint-groups>
</rest-server-socket-endpoint-config>
</advanced-network>
...
</hazelcast>
hazelcast:
advanced-network:
rest-server-socket-endpoint-config:
endpoint-groups:
HEALTCH_CHECK:
enabled: true
Health Check
This is Hazelcast’s HTTP based health check implementation which provides basic information about your cluster and member (on which it is launched).
First, you need to enable the health check as explained in the introduction of this section above.
Now you retrieve information about your cluster’s health status (member state,
cluster state, cluster size, etc.) by launching http://<your member's host IP>:5701/hazelcast/health
on your preferred browser.
The following is the output with example values:
{
"nodeState": "ACTIVE",
"clusterState": "ACTIVE",
"clusterSafe": true,
"migrationQueueSize": 0,
"clusterSize": 3
}
nodeState
: Specifies the state of member on which the health check is launched;
see the Member States section
to learn about the states of cluster members.
clusterState
: Specifies the state of cluster which the health checked member belongs to;
see the Cluster States section
to learn about the states of clusters.
clusterSafe
: Specifies whether the cluster is safe, i.e., there are no active partition
migrations and all backups are in sync for each partition in the cluster;
see the Safety Checking section
to learn about the safety of clusters.
migrationQueueSize
: Specifies the count of remaining migration tasks while the cluster data
is being repartitioned. See the Data Partitioning section
to learn about Hazelcast’s partitioning mechanism.
clusterSize
: Specifies the cluster member count.
Using the healthcheck.sh Script
The healthcheck.sh
script comes with the Hazelcast package. Internally, it uses
the HTTP-based Health Check endpoint. You will need to enable the endpoint by using the
advanced-network
or the network
configuration element.
See the Health Check and Monitoring section.
You can use the script to check health parameters in the following manner:
./healthcheck.sh <parameters>
The following parameters can be used:
Parameter | Default Value | Description |
---|---|---|
|
|
Health check operation. It can be |
|
|
Defines the IP address of a cluster member. If you want to manage your cluster remotely, you should use this parameter to provide the IP address of a member to this script. |
|
|
Defines on which port Hazelcast is running on the local or remote machine. |
|
no argument expected |
Lists the parameter descriptions along with a usage example. |
|
no argument expected |
Prints error output. |
|
no argument expected |
Uses HTTPS protocol for REST calls. |
|
set of well-known CA certificates |
Defines trusted PEM-encoded certificate file path. It’s used to verify member certificates. |
|
None |
Defines PEM-encoded client certificate file path. Only needed when client certificate authentication is used. |
|
None |
Defines PEM-encoded client private key file path. Only needed when client certificate authentication is used. |
|
no argument expected |
Disables member certificate verification. |
Example 1: Checking Member State of a Healthy Cluster:
Assuming the member is deployed under the address 127.0.0.1:5701
and
it is in the healthy state, the following output is expected:
./healthcheck.sh -a 127.0.0.1 -p 5701 -o node-state
ACTIVE
Example 2: Checking Safety of a Non-Existing Cluster:
Assuming there is no member running under the address 127.0.0.1:5701
, the following output is expected:
./healthcheck.sh -a 127.0.0.1 -p 5701 -o cluster-safe
Error while checking health of hazelcast cluster on ip 127.0.0.1 on port 5701.
Please check that cluster is running and that health check is enabled in REST API configuration.
Health Monitor
Health monitor periodically prints logs in your console to provide information about your member’s state. By default, it is enabled when you start your cluster.
You can set the interval of health monitoring using
the hazelcast.health.monitoring.delay.seconds
system property. Its default value is 20 seconds.
The system property hazelcast.health.monitoring.level
is used to configure the monitoring’s log level.
If it is set to OFF, the monitoring is disabled.
If it is set to NOISY, monitoring logs are always printed for the defined intervals.
When it is SILENT, which is the default value, monitoring logs are printed only when
the values exceed some predefined thresholds. These thresholds are related to
memory and CPU percentages, and can be configured using the
hazelcast.health.monitoring.threshold.memory.percentage
and
hazelcast.health.monitoring.threshold.cpu.percentage
system properties, whose default values are both 70.
The following is an example monitoring output
Sep 08, 2017 5:02:28 PM com.hazelcast.internal.diagnostics.HealthMonitor
INFO: [192.168.2.44]:5701 [host-name] [3.9] processors=4, physical.memory.total=16.0G, physical.memory.free=5.5G, swap.space.total=0, swap.space.free=0, heap.memory.used=102.4M,
heap.memory.free=249.1M, heap.memory.total=351.5M, heap.memory.max=3.6G, heap.memory.used/total=29.14%, heap.memory.used/max=2.81%, minor.gc.count=4, minor.gc.time=68ms, major.gc.count=1,
major.gc.time=41ms, load.process=0.44%, load.system=1.00%, load.systemAverage=315.48%, thread.count=97, thread.peakCount=98, cluster.timeDiff=0, event.q.size=0, executor.q.async.size=0,
executor.q.client.size=0, executor.q.query.size=0, executor.q.scheduled.size=0, executor.q.io.size=0, executor.q.system.size=0, executor.q.operations.size=0,
executor.q.priorityOperation.size=0, operations.completed.count=226, executor.q.mapLoad.size=0, executor.q.mapLoadAllKeys.size=0, executor.q.cluster.size=0, executor.q.response.size=0,
operations.running.count=0, operations.pending.invocations.percentage=0.00%, operations.pending.invocations.count=0, proxy.count=0, clientEndpoint.count=1,
connection.active.count=2, client.connection.count=1, connection.count=1
See the Configuring with System Properties section to learn how to set system properties. |
Using Health Check on F5 BIG-IP LTM
The F5® BIG-IP® Local Traffic Manager™ (LTM) can be used as a load balancer for Hazelcast cluster members. This section describes how you can configure a health monitor to check the Hazelcast member states.
Monitor Types
Following types of monitors can be used to track Hazelcast cluster members:
-
HTTP Monitor: A custom HTTP monitor enables you to send a command to Hazelcast’s Health Check API using HTTP requests. This is a good choice if SSL/TLS is not enabled in your cluster.
-
HTTPS Monitor: A custom HTTPS monitor enables you to verify the health of Hazelcast cluster members by sending a command to Hazelcast’s Health Check API using Secure Socket Layer (SSL) security. This is a good choice if SSL/TLS is enabled in your cluster.
-
TCP\_HALF\_OPEN Monitor: A TCP\_HALF\_OPEN monitor is a very basic monitor that only checks that the TCP port used by Hazelcast is open and responding to connection requests. It does not interact with the Hazelcast Health Check API. The TCP\_HALF\_OPEN monitor can be used with or without SSL/TLS.
Configuration
After signing in to the BIG-IP LTM User Interface, follow F5’s ^instructions to create a new monitor. Next, apply the following configuration according to your monitor type.
HTTP/HTTPS Monitors
Please note that you should enable the Hazelcast health check for
HTTP/HTTPS monitors to run. You will need to enable the endpoint by using the
advanced-network or the network configuration element.
See the Health Check and Monitoring section.
|
Using a GET request:
-
Set the “Send String” as follows:
GET /hazelcast/health HTTP/1.1\r\n\nHost: [HOST-ADDRESS-OF-HAZELCAST-MEMBER] \r\nConnection: Close\r\n\r\n
-
Set the “Receive String” as follows:
{"nodeState":"ACTIVE","clusterState":"ACTIVE","clusterSafe":true,"migrationQueueSize":0,"clusterSize":([^\s]+)}
The BIG-IP LTM monitors accept regular expressions in these strings allowing you to configure them as needed. The example provided above remains green even if the cluster size changes.
Using a HEAD request:
-
Set the “Send String” as follows:
HEAD /hazelcast/health HTTP/1.1\r\n\nHost: [HOST-ADDRESS-OF-HAZELCAST-MEMBER] \r\nConnection: Close\r\n\r\n
-
Set the “Receive String” as follows:
200 OK
As you can see, the HEAD request only checks for a 200 OK
response.
A Hazelcast cluster member sends this status code when it is alive and running without an issue.
This provides a very basic health check. For increased flexibility, we recommend using the GET request API.