Distributed Computing
Explore the tools that Hazelcast offers for distributed computing on cluster members.
What is Distributed Computing?
Distributed computing is the process of running computational tasks on different cluster members. With distributed computing, computations are faster thanks to the following advantages:
-
Leveraging the combined processing power of a cluster.
-
Reducing network hops by running computations on the cluster that owns the data
Available Tools
Hazelcast offers the following tools for distributed computing, depending on your use case:
-
Entry processor: Update, remove, and read map entries on cluster members (servers).
-
Executor service: Execute your own Java code on cluster members and receive a result.
-
Pipeline: Create a data pipeline that runs fast batch or streaming processes on cluster members.
When to use an Entry Processor
An entry processor is a good option if you perform bulk processing on a map. Usually you perform a loop of keys, executing map.get(key)
, mutating the value and finally putting the entry back in the map using map.put(key,value)
. If you perform this process from a client or from a member where the keys do not exist, you effectively perform two network hops for each update: The first to retrieve the data and the second to update the mutated value.
If you are doing the process described above, you should consider using entry processors. An entry processor executes read and update operations on the member where the data resides. This eliminates the costly network hops.
Entry processors are meant to process a single entry per call. Processing multiple entries and data structures in an entry processor is not supported as it may result in deadlocks. To process multiple entries at once, use a pipeline. |
When to use an Executor Service
The executor service is ideal for running arbitrary Java code on your cluster members.
When to use a Pipeline
Pipelines are a good fit when you want to perform processing that involves multiple entries (aggregations, joins, etc.), or involves multiple computing steps to be made parallel. Pipelines allow you to update maps as a result of your computation, using an entry processor sink.