A newer version of Hazelcast Platform is available.

View latest

Kubernetes Auto Discovery

Before you start a Hazelcast cluster in Kubernetes, it’s important to configure settings to ensure that members can form a cluster and to prevent any unexpected data loss.

Discovering Members

To make it easier to set up clusters in Kubernetes, Hazelcast allows members to discover each other automatically, using discovery modes. Configure your members with one of the following discovery modes to allow them to form a cluster:

Kubernetes API DNS Lookup

Description

Uses REST calls to Kubernetes master to fetch IP addresses of Pods

Uses DNS to resolve IPs of Pods related to the given service

Pros

Flexible, supports 3 different options:

  • Hazelcast cluster per service

  • Hazelcast cluster per multiple services (distinguished by labels)

  • Hazelcast cluster per namespace

No additional configuration required, resolving DNS does not require granting any permissions

Cons

Requires setting up RoleBinding (to allow access to Kubernetes API)

Limited to headless Cluster IP service

Limited to Hazelcast cluster per service

Using Kubernetes in API Mode

In Kubernetes API mode, each node makes a REST call to the Kubernetes master to discover the IP addresses of any Hazelcast members running in Pods.

Granting Permissions to use Kubernetes API

To use the Kubernetes API, you must grant certain permissions.

To grant them for 'default' service account in 'default' namespace, execute the following command.

kubectl apply -f https://raw.githubusercontent.com/hazelcast/hazelcast/master/kubernetes-rbac.yaml

Creating a Service

Hazelcast Kubernetes Discovery requires creating a service of any type in any Pods where Hazelcast is running.

apiVersion: v1
kind: Service
metadata:
  name: MY-SERVICE-NAME
spec:
  type: LoadBalancer
  selector:
    app: APP-NAME
  ports:
  - name: hazelcast
    port: 5701
If the service exposes multiple ports, name the port that exposes the Hazelcast member hazelcast. This configuration ensures that auto-discovery functions correctly.

Hazelcast Configuration

The second step is to configure the discovery plugin inside hazelcast.yaml or an equivalent Java-based configuration.

  • YAML

  • Java

hazelcast:
  network:
    join:
      multicast:
        enabled: false
      kubernetes:
        enabled: true
        namespace: MY-KUBERNETES-NAMESPACE
        service-name: MY-SERVICE-NAME
config.getNetworkConfig().getJoin().getMulticastConfig().setEnabled(false);
config.getNetworkConfig().getJoin().getKubernetesConfig().setEnabled(true)
      .setProperty("namespace", "MY-KUBERNETES-NAMESPACE")
      .setProperty("service-name", "MY-SERVICE-NAME");

There are several properties to configure, all of which are optional.

  • namespace: Kubernetes Namespace where Hazelcast is running; if not specified, the value is taken from the environment variables KUBERNETES_NAMESPACE or OPENSHIFT_BUILD_NAMESPACE. If those are not set, the namespace of the Pod will be used (retrieved from /var/run/secrets/kubernetes.io/serviceaccount/namespace).

  • service-name: service name used to scan only Pods connected to the given service; if not specified, then all Pods in the namespace are checked

    If you don’t specify service-name and fall back to a namespace only discovery, all pods in the namespace must be Hazelcast member pods; other pods might block the member discovery process of the Hazelcast member pods.
  • service-label-name, service-label-value: service label and value used to tag services, which together form the Hazelcast cluster. These properties can support multiple comma-separated values. For example: "label-1,label-2". You must use the same number of elements in service-label-name as service-label-value.

  • pod-label-name, pod-label-value: pod label and value used to tag Pods, which together form the Hazelcast cluster. These properties can support multiple comma-separated values. For example: "label-1,label-2". You must use the same number of elements in pod-label-name as pod-label-value.

  • resolve-not-ready-addresses: if set to true, it checks also the addresses of Pods which are not ready; true by default

  • expose-externally: if set to true, it fails fast if an external address cannot be found for each member; if set to false, it does not check for external member addresses; by default it tries to resolve external addresses but fails silently

    If your service type is NodePort, there might be connectivity problems from outside to the Kubernetes cluster. Therefore, it’s crucial to verify that NodePort IPs are externally accessible.
  • service-per-pod-label-name, service-per-pod-label-value: service label and value used to tag services that expose each Hazelcast member with a separate Kubernetes service (for connecting Hazelcast Smart Client running outside the Kubernetes cluster)

  • use-node-name-as-external-address: if set to true, uses the node name to connect to a NodePort service instead of looking up the external IP using the API; false by default

  • kubernetes-api-retries: number of retries in case of issues while connecting to Kubernetes API; defaults to 3

  • kubernetes-master: URL of Kubernetes Master; https://kubernetes.default.svc by default

  • api-token: API Token to Kubernetes API; if not specified, the value is taken from the file /var/run/secrets/kubernetes.io/serviceaccount/token

  • ca-certificate: CA Certificate for Kubernetes API; if not specified, the value is taken from the file /var/run/secrets/kubernetes.io/serviceaccount/ca.crt

  • service-port: endpoint port of the service; if specified with a value greater than 0, it overrides the default; 0 by default

You can use one of service-name,service-label(service-label-name, service-label-value) and pod-label(pod-label-name, pod-label-value) based discovery mechanisms, configuring two of them at once does not make sense.

If you don’t specify any property at all, then the Hazelcast cluster is formed using all Pods in your current namespace. In other words, you can look at the properties as a grouping feature if you want to have multiple Hazelcast clusters in one namespace.

Using Kubernetes in DNS Lookup Mode

DNS Lookup mode uses a feature of Kubernetes that headless (without cluster IP) services are assigned a DNS record which resolves to the set of IPs of related Pods.

Creating Headless Service

Headless service is a service of type ClusterIP with the clusterIP property set to None.

apiVersion: v1
kind: Service
metadata:
  name: MY-SERVICE-NAME
spec:
  type: ClusterIP
  clusterIP: None
  selector:
    app: APP-NAME
  ports:
  - name: hazelcast
    port: 5701

Hazelcast Configuration

The Hazelcast configuration to use DNS Lookup looks as follows.

  • YAML

  • Java

hazelcast:
  network:
    join:
      kubernetes:
        enabled: true
        service-dns: MY-SERVICE-DNS-NAME
config.getNetworkConfig().getJoin().getMulticastConfig().setEnabled(false);
config.getNetworkConfig().getJoin().getKubernetesConfig().setEnabled(true)
      .setProperty("service-dns", "MY-SERVICE-DNS-NAME");

There are 3 properties to configure the plugin:

  • service-dns (required): service DNS, usually in the form of SERVICE-NAME.NAMESPACE.svc.cluster.local

  • service-dns-timeout (optional): custom time for how long the DNS Lookup is checked

  • service-port (optional): the Hazelcast port; if specified with a value greater than 0, it overrides the default (default port = 5701)

Partitioning to Prevent Data Loss

By default, Hazelcast distributes partition replicas (backups) randomly and equally among cluster members. However, this is not safe in terms of high availability when a partition and its replicas are stored on the same rack, using the same network, or power source. To deal with that, Hazelcast offers logical partition grouping, so that a partition itself and its backups would not be stored within the same group. This way Hazelcast guarantees that a possible failure affecting more than one member at a time will not cause data loss. For more details about partition groups, see Partition Group Configuration.

Zone Aware

When using ZONE_AWARE configuration, backups are created in the other availability zone. This feature is available only for the Kubernetes API mode.

Your Kubernetes cluster must orchestrate Hazelcast Member Pods equally between the availability zones, otherwise Zone Aware feature may not work correctly.
  • YAML

  • Java

partition-group:
  enabled: true
  group-type: ZONE_AWARE
config.getPartitionGroupConfig()
    .setEnabled(true)
    .setGroupType(MemberGroupType.ZONE_AWARE);

Note the following aspects of ZONE_AWARE:

  • Kubernetes cluster must provide the well-known Kubernetes annotations

  • Retrieving Zone Name uses Kubernetes API, so RBAC must be configured

  • ZONE_AWARE feature works correctly when Hazelcast members are distributed equally in all zones, so your Kubernetes cluster must orchestrate Pods equally

Note also that retrieving Zone Name assumes that your container’s hostname is the same as Pod Name, which is almost always true. If you happen to change your hostname in the container, then please define the following environment variable:

env:
  - name: POD_NAME
    valueFrom:
      fieldRef:
        fieldPath: metadata.name

Node Aware

When using NODE_AWARE configuration, backups are created in the other Kubernetes nodes. This feature is available only for the Kubernetes API mode.

Your Kubernetes cluster must orchestrate Hazelcast Member Pods equally between the nodes, otherwise Node Aware feature may not work correctly.

YAML Configuration

partition-group:
  enabled: true
  group-type: NODE_AWARE

Java-based Configuration

config.getPartitionGroupConfig()
    .setEnabled(true)
    .setGroupType(MemberGroupType.NODE_AWARE);

Note the following aspects of NODE_AWARE:

  • Retrieving name of the node uses Kubernetes API, so RBAC must be configured

  • NODE_AWARE feature works correctly when Hazelcast members are distributed equally in all nodes, so your Kubernetes cluster must orchestrate Pods equally.

Note also that retrieving name of the node assumes that your container’s hostname is the same as Pod Name, which is almost always true. If you happen to change your hostname in the container, then please define the following environment variable:

env:
- name: POD_NAME
  valueFrom:
    fieldRef:
      fieldPath: metadata.name

Preventing Data Loss During Upgrades

By default, Hazelcast does not shutdown gracefully. As a result, if you suddenly terminate more members than your configured backup-count property (1 by default), you may lose the cluster data.

To prevent data loss, set the following properties.

All these properties are already set in Hazelcast Operator.
  • terminationGracePeriodSeconds: in your StatefulSet (or Deployment) configuration; the value should be high enough to cover the data migration process

  • -Dhazelcast.shutdownhook.policy=GRACEFUL: in the JVM parameters

  • -Dhazelcast.graceful.shutdown.max.wait: in the JVM parameters; the value should be high enough to cover the data migration process

  • If you use Deployment (not StatefulSet), you need to set your strategy to RollingUpdate and ensure Pods are updated one by one.

  • If you upgrade by the minor version, e.g., 3.11.4 ⇒ 3.12 (Enterprise feature), you need to set the -Dhazelcast.cluster.version.auto.upgrade.enabled=true JVM property to make sure the cluster version updates automatically.

Discovering Members from Hazelcast Clients

For the client to discover the Hazelcast cluster, all it needs to know is the address by which the cluster is accessible.

Inside Kubernetes Cluster

If you have a Hazelcast cluster and a Hazelcast client deployed on the same Kubernetes cluster, you should use the Kubernetes service name in the client’s configuration.

  • YAML

  • Java

  • NodeJS

  • Python

  • C++

  • Go

hazelcast-client:
  network:
    cluster-members:
      - MY-SERVICE-NAME
clientConfig.getNetworkConfig().addAddress("MY-SERVICE-NAME");
const clientConfig = {
    network: {
        clusterMembers: [
            'MY-SERVICE-NAME'
        ]
    }
};
client = hazelcast.HazelcastClient(
    cluster_members=["MY-SERVICE-NAME"],
)
config.get_network_config().add_address({"MY-SERVICE-NAME", 5701})
config.Cluster.Network.SetAddresses("MY-SERVICE-NAME:5701")

For the complete example, please check Hazelcast Guides: Hazelcast for Kubernetes.

Outside Kubernetes Cluster

If your Hazelcast cluster is deployed on Kubernetes, but your Hazelcast client is in a completely different network, then it can connect only through the public Internet. This requires exposing each Hazelcast member pod with a dedicated NodePort or LoadBalancer Kubernetes service. For details and a complete example, please check Hazelcast Guides: Connect External Hazelcast Client to Kubernetes.

Running Hazelcast Enterprise with Persistence under Kubernetes

Enterprise

Hazelcast Enterprise members configured with persistence enabled can monitor the Kubernetes context and automate Hazelcast cluster state management in order to ensure the optimal cluster behaviour during shutdown and restart. Specifically:

  • During a cluster-wide shutdown, the Hazelcast cluster automatically switches to PASSIVE cluster state. Advantages, compared to the behavior in previous Hazelcast Platform releases (Hazelcast executing always with default ACTIVE state):

    • No data migrations are performed, speeding up the cluster shutdown

    • No risk of out-of-memory exception due to migrations: with the ACTIVE state and Kubernetes applying the ordered shutdown of members, all data would eventually be migrated to a single member (the last one in the sequence of shutdown). Therefore, for previous Hazelcast Platform releases, it was required to plan capacity for a single member to hold all the cluster data, or risk an out-of-memory exception.

    • Persisted cluster metadata remain consistent across all members during shutdown. This consistency allows recovery from disk to proceed without unexpected metadata validation errors. These errors may result in performing a force-start, wiping out persistent data from one or more members.

  • During temporary loss of members, e.g., rolling restart of the cluster or a pod being rescheduled by Kubernetes, Hazelcast cluster switches to a configurable cluster state (FROZEN or NO_MIGRATION) to ensure speedy recovery when the member rejoins the cluster.

  • When scaling up or down, Hazelcast automatically switches to ACTIVE cluster state, so partitions are rebalanced and data is spread across all members.

Requirements

Automatic cluster state management requires:

  • Hazelcast configured with persistence enabled

  • Kubernetes discovery is in use, either explicitly configured or as auto-detected join configuration

  • Hazelcast is deployed in a StatefulSet

  • Hazelcast is executed with a cluster role that is allowed access to apps Kubernetes API group and statefulsets resources with the watch verb. See the proposed ClusterRole configuration

Configuration

Automatic cluster state management is enabled by default when Hazelcast is configured with persistence enabled and Kubernetes discovery. It can be explicitly disabled by setting the hazelcast.persistence.auto.cluster.state Hazelcast property to false.

Depending on the use case, the hazelcast.persistence.auto.cluster.state.strategy Hazelcast property configures which cluster state will be used when members are temporarily missing from the cluster, e.g., pod is rescheduled or rolling restart is in progress. This property has the following values:

  • NO_MIGRATION: Use this as the cluster state if the cluster hosts a mix of persistent and in-memory data structures or if the cluster hosts only persistent data, but you favour availability over speed of recovery. While a member is missing from the cluster, cluster state switches to NO_MIGRATION: this way, the first replica of partitions owned by the missing member are promoted and the data are available. When the member rejoins the cluster, persistent data can be recovered from disk and a differential sync (assuming Merkle trees are enabled, which is the case by default for persistent IMap and ICache) brings them up to speed. For in-memory data structures, a full partition sync is required. This is the default value.

  • FROZEN: Use this cluster state, if your cluster hosts only persistent data structures and you do not mind temporarily losing availability of partitions owned by a missing member in exchange for a speedy recovery from disk. Once the member rejoins, no sync over the network is required.

Best practices

When running Hazelcast with persistence in Kubernetes, the following configuration is recommended:

  • Use /hazelcast/health/node-state as liveness probe and /hazelcast/health/ready as readiness probe.

  • In your persistence configuration:

    • do not set a non-zero value for rebalance-delay-seconds property. Automatic cluster state management deals with switching to appropriate cluster state that may or may not allow partition rebalancing to occur depending on the detected status of the cluster.

    • Use PARTIAL_RECOVERY_MOST_COMPLETE as cluster-data-recovery-policy and set auto-remove-stale-data to true, to minimize the need for manual interventions. Even in edge cases in which a member performs force-start (i.e. cleans its persistent data directory and rejoins the cluster with a new identity), data can usually be recovered from backup partitions (assuming IMap s and ICache s are configured with 1 or more backups).

  • Start Hazelcast with the -Dhazelcast.stale.join.prevention.duration.seconds=5 Java option. Since Kubernetes may quickly reschedule pods, the default value of 30 seconds to avoid a stale join request being processed is too high and may cause unnecessary delays in members rejoining the cluster.