Architecture Overview

Hazelcast is a distributed computation and storage platform for consistently low-latency querying, aggregation and stateful computation against event streams and traditional data sources.

A cluster of Hazelcast nodes share both the data storage and computational load which can dynamically scale up and down. When you add new nodes to the cluster, both data and computations is automatically rebalanced across the cluster.

Hazelcast High-Level Architecture

The following are the brief descriptions of blocks in the diagram above from bottom to top.

Discovery and Clustering

In Hazelcast, data is load-balanced in-memory across a cluster. This cluster is a network of members each of which runs Hazelcast. They discover each other automatically and form a cluster; the members communicate with each other via TCP after the cluster is formed. Hazelcast supports automatic discovery on cloud environments like Amazon EC2, Google Cloud Platform and Azure; you can also configure Hazelcast to discover members by TCP/IP or multicast. In addition, you can make use of the automatic member discovery in the Kubernetes environment. See the Discovery Mechanisms section.

Fault Tolerance with Hazelcast

Hazelcast distributes your storage data, computational data, and backups, among all cluster members. This way, if a member is lost, Hazelcast can restore the data from these backups, providing continuous availability. As the data itself, its backups are distributed and stored also in the memory (RAM). The distribution happens on the partition level; the primary data and its backups are stored in the memory partitions. See Data Partitioning and Partition Grouping for more information about the partitioning.

When a member in your cluster is lost, Hazelcast redistributes the backups on the remaining members so that every partition has a backup. This makes Hazelcast resilient to data loss. The number of backups is configurable. Based on the configuration, data can be kept in multiple replicas of a partition.

High Availability with Hazelcast

In Hazelcast, cluster members monitor the health of each other. When a cluster member becomes inaccessible due to for example a network failure, other members cooperatively diagnose the state; they immediately take over the responsibility of the failed member. To determine if a member is unreachable or crashed, Hazelcast provides built-in failure detectors. See here for more information.

Besides the in-cluster failover mechanism mentioned above, Hazelcast also provides replication over WAN. You can have deployments across multiple data centers using the WAN replication mechanism which offers protection against a data center or wider network failures. See here for more information.

AP/CP

In the context of CAP principle, Hazelcast offers AP (Availability and Partition Tolerance) and CP (Consistency and Partition Tolerance) functionalities with different data structure implementations. Let’s recall what these functionalities mean:

Availability: All working members in a distributed system return a valid response to any request, without exceptions.
Consistency: All the clients connected to the system see the same data at the same time, no matter which member they connect to; whenever data is written to a member, it is instantly replicated to all the other members in the system.
Partition Tolerance: Partition refers to a lost or temporarily delayed connection between the members; partition tolerance refers to continued work of the cluster despite any number of communication breakdowns between the members of the cluster.

Hazelcast as an AP system delivers availability and partition tolerance at the expense of consistency. When a communication breakdown (partition) occurs, all members remain available; however some members affected by the partition might return an older version of data than others. When the partition is resolved, the Hazelcast typically synchronizes the members to repair all inconsistencies in the system.

As a CP system, it delivers consistency and partition tolerance at the expense of availability. When a partition occurs between any members, Hazelcast makes the non-consistent member unavailable until the partition is resolved.

Data structures exposed under HazelcastInstance API are all AP data structures and the ones accessed via HazelcastInstance.getCPSubsytem() provides CP structures and APIs, which are built on the Raft consensus algorithm. See the Consistency and Replication Model section and CP Subsystem section.

Storage Engine

Storage engine provides caching and data processing on top of Hazelcast’s key/value store IMap. You can load and store data from/to external data sources, control the eviction of data entries and execute your codes on them, use Hazelcast’s specification-compliant JCache implementation and integrate your cache with Spring.

Jet (Streaming) Engine

Jet engine runs the streaming applications. For this, it uses many different connectors (sources and sinks) to get and output data, and stateless/stateful transforms with operators including join, aggregate and sort to process data: the usage of connectors and operators constitute a data pipeline.

A data pipeline is a series of processing steps that consist of a source, one or more processing steps, and a sink (destination). It may enable the flow of data from an application to a data warehouse, from a data lake to an analytics database, or into a payment processing system. Hazelcast allows you to create data pipelines, using either SQL or the Hazelcast Java API. See the Building Data Pipelines section.

Distributed System Tools

Hazelcast offers distributed implementations of standard collections, concurrency utilities and publish/subscribe messaging model.

Standard collection implementations include Maps, Queues, Lists, Sets and Ringbuffers. Concurrency utilities include AtomicLongs, AtomicReferences, Semaphores and CountDownLatches. It also provides a broadcast messaging system based on publish/subscribe model; it lets applications communicate in real time at high speed. Your applications can write ("publish") to specific channels, called "topic", and then one or more subscribers can read the messages from that topic. See the Distributed Data Structures section.

Hazelcast also provides a CP subsystem for distributed coordination use cases such as leader election, distributed locking, synchronization and metadata management. See the CP Subsystem section.

Querying with SQL

Hazelcast’s SQL engine lets you query static and streaming data. You can load static data from sources (files and maps) to transform and analyze it. Hazelcast can also load and query real-time streaming data as it is being generated, which is ideal for use cases that require complex queries with low latency, e.g., fraud detection. Besides performing transformations and queries, you can store your transform/query results in one or more systems; it is useful for sending results to other systems or caching results in Hazelcast to avoid running redundant queries.