Hazelcast Overview
Hazelcast is a distributed computation and storage platform for consistently low-latency querying, aggregation and stateful computation against event streams and traditional data sources. Hazelcast Platform uniquely combines a distributed compute engine and a fast data store in one runtime. It offers unmatched performance, resilience and scale for real-time and AI-driven applications. It allows you to quickly build resource-efficient, real-time applications. You can deploy it at any scale from small edge devices to a large cluster of cloud instances.
Hazelcast can process data on a set of networked and clustered computers that pool together their random access memories (RAM) to let applications share data with other applications running in the cluster. When data is stored in RAM, applications run a lot faster since it does not need to be retrieved from disk and put into RAM prior to processing. Using Hazelcast, you can store and process your data in RAM, spread and replicate it across a cluster of machines; replication gives you resilience to failures of cluster members.
Hazelcast is implemented in Java language and has clients for Java, C++, .NET, REST, Python, Go and Node.js. Hazelcast also speaks Memcached and REST protocols.
Your cloud-native applications can easily use Hazelcast. It is flexible enough to use as a data and computing platform out-of-the-box or as a framework for your own cloud-native applications and microservices.
Hazelcast is designed to be lightweight and easy to use. Since it is delivered as a compact library, it easily plugs into your software solution.
It is designed to scale up to hundreds of members and thousands of clients. When you add new members, they automatically discover the cluster and linearly increase both the memory and processing capacity. The members maintain a TCP connection between each other and all communication is performed through this layer. Each cluster member is configured to be the same in terms of functionality. The oldest member (the first member created in the cluster) automatically performs the stored and streaming data assignment to cluster members. If the oldest member dies, the second oldest member takes over.
Hazelcast offers simple scalability, partitioning (sharding), and re-balancing out-of-the-box. It does not require any extra coordination processes. NoSQL and traditional databases are difficult to scale out and manage. They require additional processes for coordination and high availability. With Hazelcast, when you start another process to add more capacity, data and backups are automatically and evenly balanced.
What Can You Do with Hazelcast?
You can request data, listen to events, submit data processing tasks using Hazelcast clients connected to a cluster. Hazelcast has clients implemented in Java, .Net, C++, Node.js, Go and Python languages. It also communicates with Memcache and REST protocols. See the Hazelcast Clients section.
You can build data pipelines using SQL or the Java API which enable the data to flow from an application to a data source or from a data source to an analytics database. A very simple example can be reading words from a file, converting them into all-uppercase, and output the results to your console. See the Building Data Pipelines section.
You can import data from databases, files, messaging systems, on-premise and cloud systems in various formats (data ingestion). Hazelcast offers pipelines and loading/storing interfaces for this purpose. See the Ingesting Data section.
You can run queries on the data using SQL in your maps or external systems like Apache Kafka. You can also use the predicates API to filter and retrieve data in your maps. See the Distributed Queries section.
You can run computational tasks on different cluster members (distributed computing); for this you can use the pipelines, entry processors, and executor services. See Distributed Computing section.
You can store your data using the distributed implementation of various data structures like maps, caches, queues, topics, concurrency utilities. See the Distributed Data Structures section.
You can use it as a distributed second level cache for your Hibernate entities, collections and queries. Also, Hazelcast can be used as a cache for applications based on the Spring Framework for distributing shared data in real time.
You can have multiple Hazelcast clusters at different locations in sync by replicating their state over WAN environments. See the Synchronizing Data Across a WAN section.
You can listen to the events happening in the cluster, on the data structures and clients so that you are notified when those events happen. See the Distributed Events section.
Please see the Develop Solutions chapter for all the scenarios you can realize using Hazelcast.
The following are example use cases:
-
Increasing the transactions per second and system uptime in payment processing
-
Authorizing and authenticating the credit cards using multiple algorithms within milliseconds for fraud detection
-
Decreasing order processing times with low latencies in e-commerce
-
Being a real-time streamer to detect application performance
-
Clustering highly changing data with event notifications, e.g., user based events, and queueing and distributing background tasks
-
Being a distributed topic (publish/subscribe server) to build scalable chat servers for smartphones
-
Constructing a strongly consistent layer using its CP (CP with respect the CAP principle) subsystem built on top of the Raft consensus algorithm
-
Distributing user object states across the cluster, to pass messages between objects and to share system data structures (static initialization state, mirrored objects, object identity generators)
-
Being a multi-tenancy cache where each tenant has its own stored data
-
Sharing datasets, e.g., table-like data structure, to be used by applications
-
Storing session data in web applications (enabling horizontal scalability of the web application)
The applications call for stored data from data sources, e.g., databases, and these sources are slow since they are not designed to respond in time. Hazelcast runs in the shared standard memories (RAMs) of a cluster of servers which sits in between the applications and the datasources. It gets data from the data sources, pulls it up in the memory and serves it to the applications at in-memory speeds, instead of RPM speeds. This makes Hazelcast a low-latency solution (fast response times).
When it comes to streaming data, they come from sources like files, Apache Kafka, IoT and MQ and they are born in the moment. Hazelcast captures this data in the moment and effectively processes it on the wire. And this makes Hazelcast a real-time processing platform.
The unique capability of Hazelcast is its ability to process both batch and streaming data, with low latency and in real-time, enabling transactional and analytical processing.
Architecture Overview
The fundamental key components of Hazelcast are as follows:
-
A member is the computational and data storage unit in Hazelcast. Typically it is a JVM.
-
A Hazelcast cluster is a set of members communicating with each other. Members which run Hazelcast automatically discover one another and form a cluster at runtime.
-
Partitions are the memory segments that store portions of data. They are distributed evenly among the available cluster members. They can contain hundreds or thousands of data entries each, depending on the memory capacity of your system. Hazelcast also automatically creates backups of these partitions which are also distributed in the cluster. This makes Hazelcast resilient to data loss.
Node and Member are interchangeable, and both mean a Java Virtual Machine (JVM) on which one or more instances of Hazelcast are in operation. |
Hazelcast’s streaming engine focuses on data transformation while it does all the heavy lifting of getting the data flowing and computation running across a cluster of members. It supports working with both bounded (batch) and unbounded (streaming) data.
Hazelcast’s storage engine is the distributed, fast, and operational data store dealing with persistence of data.
Hazelcast comes out of the box with different sources and sinks. Sources are where Hazelcast pulls the data, and sinks are where it outputs the processed data result. Sources and sinks are also referred to as connectors. Its unified connector API provides a simple way to read files, unified across different sources of the data. See the Sources and Sinks section for more information about the unified connector API and the supported sources and sinks.