Kubernetes Auto Discovery
Before you start a Hazelcast cluster in Kubernetes, it’s important to configure settings to ensure that members can form a cluster and to prevent any unexpected data loss.
Discovering Members
To make it easier to set up clusters in Kubernetes, Hazelcast allows members to discover each other automatically, using discovery modes. Configure your members with one of the following discovery modes to allow them to form a cluster:
Kubernetes API | DNS Lookup | |
---|---|---|
Description |
Uses REST calls to Kubernetes master to fetch IP addresses of Pods |
Uses DNS to resolve IPs of Pods related to the given service |
Pros |
Flexible, supports 3 different options:
|
No additional configuration required, resolving DNS does not require granting any permissions |
Cons |
Requires setting up RoleBinding (to allow access to Kubernetes API) |
Limited to headless Cluster IP service Limited to Hazelcast cluster per service |
Using Kubernetes in API Mode
In Kubernetes API mode, each node makes a REST call to the Kubernetes master to discover the IP addresses of any Hazelcast members running in Pods.
Granting Permissions to use Kubernetes API
To use the Kubernetes API, you must grant certain permissions.
To grant them for 'default' service account in 'default' namespace, execute the following command.
kubectl apply -f https://raw.githubusercontent.com/hazelcast/hazelcast/master/kubernetes-rbac.yaml
Creating a Service
Hazelcast Kubernetes Discovery requires creating a service of any type in any Pods where Hazelcast is running.
apiVersion: v1
kind: Service
metadata:
name: MY-SERVICE-NAME
spec:
type: LoadBalancer
selector:
app: APP-NAME
ports:
- name: hazelcast
port: 5701
If the service exposes multiple ports, name the port that exposes the Hazelcast member hazelcast . This configuration ensures that auto-discovery functions correctly.
|
Hazelcast Configuration
The second step is to configure the discovery plugin inside hazelcast.yaml
or an equivalent Java-based configuration.
hazelcast:
network:
join:
multicast:
enabled: false
kubernetes:
enabled: true
namespace: MY-KUBERNETES-NAMESPACE
service-name: MY-SERVICE-NAME
config.getNetworkConfig().getJoin().getMulticastConfig().setEnabled(false);
config.getNetworkConfig().getJoin().getKubernetesConfig().setEnabled(true)
.setProperty("namespace", "MY-KUBERNETES-NAMESPACE")
.setProperty("service-name", "MY-SERVICE-NAME");
There are several properties to configure, all of which are optional.
-
namespace
: Kubernetes Namespace where Hazelcast is running; if not specified, the value is taken from the environment variablesKUBERNETES_NAMESPACE
orOPENSHIFT_BUILD_NAMESPACE
. If those are not set, the namespace of the Pod will be used (retrieved from/var/run/secrets/kubernetes.io/serviceaccount/namespace
). -
service-name
: service name used to scan only Pods connected to the given service; if not specified, then all Pods in the namespace are checkedIf you don’t specify service-name
and fall back to anamespace
only discovery, all pods in the namespace must be Hazelcast member pods; other pods might block the member discovery process of the Hazelcast member pods. -
service-label-name
,service-label-value
: service label and value used to tag services, which together form the Hazelcast cluster. These properties can support multiple comma-separated values. For example: "label-1,label-2". You must use the same number of elements inservice-label-name
asservice-label-value
. -
pod-label-name
,pod-label-value
: pod label and value used to tag Pods, which together form the Hazelcast cluster. These properties can support multiple comma-separated values. For example: "label-1,label-2". You must use the same number of elements inpod-label-name
aspod-label-value
. -
resolve-not-ready-addresses
: if set totrue
, it checks also the addresses of Pods which are not ready;true
by default -
expose-externally
: if set totrue
, it fails fast if an external address cannot be found for each member; if set tofalse
, it does not check for external member addresses; by default it tries to resolve external addresses but fails silentlyIf your service type is NodePort
, there might be connectivity problems from outside to the Kubernetes cluster. Therefore, it’s crucial to verify that NodePort IPs are externally accessible. -
service-per-pod-label-name
,service-per-pod-label-value
: service label and value used to tag services that expose each Hazelcast member with a separate Kubernetes service (for connecting Hazelcast Smart Client running outside the Kubernetes cluster) -
use-node-name-as-external-address
: if set totrue
, uses the node name to connect to aNodePort
service instead of looking up the external IP using the API;false
by default -
kubernetes-api-retries
: number of retries in case of issues while connecting to Kubernetes API; defaults to3
-
kubernetes-master
: URL of Kubernetes Master;https://kubernetes.default.svc
by default -
api-token
: API Token to Kubernetes API; if not specified, the value is taken from the file/var/run/secrets/kubernetes.io/serviceaccount/token
-
ca-certificate
: CA Certificate for Kubernetes API; if not specified, the value is taken from the file/var/run/secrets/kubernetes.io/serviceaccount/ca.crt
-
service-port
: endpoint port of the service; if specified with a value greater than0
, it overrides the default;0
by default
You can use one of service-name
,service-label
(service-label-name
, service-label-value
) and pod-label
(pod-label-name
, pod-label-value
) based discovery mechanisms, configuring two of them at once does not make sense.
If you don’t specify any property at all, then the Hazelcast cluster is formed using all Pods in your current namespace. In other words, you can look at the properties as a grouping feature if you want to have multiple Hazelcast clusters in one namespace. |
Using Kubernetes in DNS Lookup Mode
DNS Lookup mode uses a feature of Kubernetes that headless (without cluster IP) services are assigned a DNS record which resolves to the set of IPs of related Pods.
Creating Headless Service
Headless service is a service of type ClusterIP
with the clusterIP
property set to None
.
apiVersion: v1
kind: Service
metadata:
name: MY-SERVICE-NAME
spec:
type: ClusterIP
clusterIP: None
selector:
app: APP-NAME
ports:
- name: hazelcast
port: 5701
Hazelcast Configuration
The Hazelcast configuration to use DNS Lookup looks as follows.
hazelcast:
network:
join:
kubernetes:
enabled: true
service-dns: MY-SERVICE-DNS-NAME
config.getNetworkConfig().getJoin().getMulticastConfig().setEnabled(false);
config.getNetworkConfig().getJoin().getKubernetesConfig().setEnabled(true)
.setProperty("service-dns", "MY-SERVICE-DNS-NAME");
There are 3 properties to configure the plugin:
-
service-dns
(required): service DNS, usually in the form ofSERVICE-NAME.NAMESPACE.svc.cluster.local
-
service-dns-timeout
(optional): custom time for how long the DNS Lookup is checked -
service-port
(optional): the Hazelcast port; if specified with a value greater than 0, it overrides the default (default port =5701
)
Partitioning to Prevent Data Loss
By default, Hazelcast distributes partition replicas (backups) randomly and equally among cluster members. However, this is not safe in terms of high availability when a partition and its replicas are stored on the same rack, using the same network, or power source. To deal with that, Hazelcast offers logical partition grouping, so that a partition itself and its backups would not be stored within the same group. This way Hazelcast guarantees that a possible failure affecting more than one member at a time will not cause data loss. For more details about partition groups, see Partition Group Configuration.
Zone Aware
When using ZONE_AWARE
configuration, backups are created in the other availability zone. This feature is available only for the Kubernetes API mode.
Your Kubernetes cluster must orchestrate Hazelcast Member Pods equally between the availability zones, otherwise Zone Aware feature may not work correctly. |
partition-group:
enabled: true
group-type: ZONE_AWARE
config.getPartitionGroupConfig()
.setEnabled(true)
.setGroupType(MemberGroupType.ZONE_AWARE);
Note the following aspects of ZONE_AWARE
:
-
Kubernetes cluster must provide the well-known Kubernetes annotations
-
Retrieving Zone Name uses Kubernetes API, so RBAC must be configured
-
ZONE_AWARE
feature works correctly when Hazelcast members are distributed equally in all zones, so your Kubernetes cluster must orchestrate Pods equally
Note also that retrieving Zone Name assumes that your container’s hostname is the same as Pod Name, which is almost always true. If you happen to change your hostname in the container, then please define the following environment variable:
env:
- name: POD_NAME
valueFrom:
fieldRef:
fieldPath: metadata.name
Node Aware
When using NODE_AWARE
configuration, backups are created in the other Kubernetes nodes. This feature is available only for the Kubernetes API mode.
Your Kubernetes cluster must orchestrate Hazelcast Member Pods equally between the nodes, otherwise Node Aware feature may not work correctly. |
Java-based Configuration
config.getPartitionGroupConfig()
.setEnabled(true)
.setGroupType(MemberGroupType.NODE_AWARE);
Note the following aspects of NODE_AWARE
:
-
Retrieving name of the node uses Kubernetes API, so RBAC must be configured
-
NODE_AWARE
feature works correctly when Hazelcast members are distributed equally in all nodes, so your Kubernetes cluster must orchestrate Pods equally.
Note also that retrieving name of the node assumes that your container’s hostname is the same as Pod Name, which is almost always true. If you happen to change your hostname in the container, then please define the following environment variable:
env:
- name: POD_NAME
valueFrom:
fieldRef:
fieldPath: metadata.name
Preventing Data Loss During Upgrades
By default, Hazelcast does not shutdown gracefully. As a result, if you suddenly terminate more members than your configured backup-count
property (1 by default), you may lose the cluster data.
To prevent data loss, set the following properties.
All these properties are already set in Hazelcast Operator. |
-
terminationGracePeriodSeconds
: in your StatefulSet (or Deployment) configuration; the value should be high enough to cover the data migration process -
-Dhazelcast.shutdownhook.policy=GRACEFUL
: in the JVM parameters -
-Dhazelcast.graceful.shutdown.max.wait
: in the JVM parameters; the value should be high enough to cover the data migration process -
If you use Deployment (not StatefulSet), you need to set your strategy to RollingUpdate and ensure Pods are updated one by one.
-
If you upgrade by the minor version, e.g.,
3.11.4 ⇒ 3.12
(Enterprise feature), you need to set the-Dhazelcast.cluster.version.auto.upgrade.enabled=true
JVM property to make sure the cluster version updates automatically.
Discovering Members from Hazelcast Clients
For the client to discover the Hazelcast cluster, all it needs to know is the address by which the cluster is accessible.
Inside Kubernetes Cluster
If you have a Hazelcast cluster and a Hazelcast client deployed on the same Kubernetes cluster, you should use the Kubernetes service name in the client’s configuration.
hazelcast-client:
network:
cluster-members:
- MY-SERVICE-NAME
clientConfig.getNetworkConfig().addAddress("MY-SERVICE-NAME");
const clientConfig = {
network: {
clusterMembers: [
'MY-SERVICE-NAME'
]
}
};
client = hazelcast.HazelcastClient(
cluster_members=["MY-SERVICE-NAME"],
)
config.get_network_config().add_address({"MY-SERVICE-NAME", 5701})
config.Cluster.Network.SetAddresses("MY-SERVICE-NAME:5701")
For the complete example, please check Hazelcast Guides: Hazelcast for Kubernetes.
Outside Kubernetes Cluster
If your Hazelcast cluster is deployed on Kubernetes, but your Hazelcast client is in a completely different network, then it can connect only through the public Internet. This requires exposing each Hazelcast member pod with a dedicated NodePort or LoadBalancer Kubernetes service. For details and a complete example, please check Hazelcast Guides: Connect External Hazelcast Client to Kubernetes.
Running Hazelcast Enterprise with Persistence under Kubernetes
Enterprise
Hazelcast Enterprise members configured with persistence enabled can monitor the Kubernetes context and automate Hazelcast cluster state management in order to ensure the optimal cluster behaviour during shutdown and restart. Specifically:
-
During a cluster-wide shutdown, the Hazelcast cluster automatically switches to
PASSIVE
cluster state. Advantages, compared to the behavior in previous Hazelcast Platform releases (Hazelcast executing always with defaultACTIVE
state):-
No data migrations are performed, speeding up the cluster shutdown
-
No risk of out-of-memory exception due to migrations: with the
ACTIVE
state and Kubernetes applying the ordered shutdown of members, all data would eventually be migrated to a single member (the last one in the sequence of shutdown). Therefore, for previous Hazelcast Platform releases, it was required to plan capacity for a single member to hold all the cluster data, or risk an out-of-memory exception. -
Persisted cluster metadata remain consistent across all members during shutdown. This consistency allows recovery from disk to proceed without unexpected metadata validation errors. These errors may result in performing a force-start, wiping out persistent data from one or more members.
-
-
During temporary loss of members, e.g., rolling restart of the cluster or a pod being rescheduled by Kubernetes, Hazelcast cluster switches to a configurable cluster state (
FROZEN
orNO_MIGRATION
) to ensure speedy recovery when the member rejoins the cluster. -
When scaling up or down, Hazelcast automatically switches to
ACTIVE
cluster state, so partitions are rebalanced and data is spread across all members.
Requirements
Automatic cluster state management requires:
-
Hazelcast configured with persistence enabled
-
Kubernetes discovery is in use, either explicitly configured or as auto-detected join configuration
-
Hazelcast is deployed in a
StatefulSet
-
Hazelcast is executed with a cluster role that is allowed access to
apps
Kubernetes API group andstatefulsets
resources with thewatch
verb. See the proposedClusterRole
configuration
Configuration
Automatic cluster state management is enabled by default when Hazelcast is configured with persistence enabled and Kubernetes discovery. It can be explicitly disabled by setting the hazelcast.persistence.auto.cluster.state
Hazelcast property to false
.
Depending on the use case, the hazelcast.persistence.auto.cluster.state.strategy
Hazelcast property configures which cluster state will be used when members are temporarily missing from the cluster, e.g., pod is rescheduled or rolling restart is in progress. This property has the following values:
-
NO_MIGRATION
: Use this as the cluster state if the cluster hosts a mix of persistent and in-memory data structures or if the cluster hosts only persistent data, but you favour availability over speed of recovery. While a member is missing from the cluster, cluster state switches toNO_MIGRATION
: this way, the first replica of partitions owned by the missing member are promoted and the data are available. When the member rejoins the cluster, persistent data can be recovered from disk and a differential sync (assuming Merkle trees are enabled, which is the case by default for persistentIMap
andICache
) brings them up to speed. For in-memory data structures, a full partition sync is required. This is the default value. -
FROZEN
: Use this cluster state, if your cluster hosts only persistent data structures and you do not mind temporarily losing availability of partitions owned by a missing member in exchange for a speedy recovery from disk. Once the member rejoins, no sync over the network is required.
Best practices
When running Hazelcast with persistence in Kubernetes, the following configuration is recommended:
-
Use
/hazelcast/health/node-state
as liveness probe and/hazelcast/health/ready
as readiness probe. -
In your persistence configuration:
-
do not set a non-zero value for
rebalance-delay-seconds
property. Automatic cluster state management deals with switching to appropriate cluster state that may or may not allow partition rebalancing to occur depending on the detected status of the cluster. -
Use
PARTIAL_RECOVERY_MOST_COMPLETE
ascluster-data-recovery-policy
and setauto-remove-stale-data
totrue
, to minimize the need for manual interventions. Even in edge cases in which a member performs force-start (i.e. cleans its persistent data directory and rejoins the cluster with a new identity), data can usually be recovered from backup partitions (assumingIMap
s andICache
s are configured with 1 or more backups).
-
-
Start Hazelcast with the
-Dhazelcast.stale.join.prevention.duration.seconds=5
Java option. Since Kubernetes may quickly reschedule pods, the default value of30
seconds to avoid a stale join request being processed is too high and may cause unnecessary delays in members rejoining the cluster.