Distributed Data Structures
Hazelcast offers
distributed implementations of many common data structures. For each of the client
languages, Hazelcast mimics as closely as possible the natural interface of the
structure. For example in Java, the map follows java.util.Map
semantics.
All of these structures are available in Java, .NET, C++, Node.js, Python, and Go.
Data structure | Description | Partitioned/Non-partitioned | AP/CP |
---|---|---|---|
Key-value pairs that are partitioned across a cluster. Maps offer a wide range of features such as SQL queries, WAN replication, and Near Cache. |
|||
A distributed mechanism for publishing messages that are delivered to multiple subscribers, also known as the publish/subscribe (pub/sub) messaging model. |
|||
Key-value pairs that are replicated across each member in the cluster. |
|||
A data structure for adding an item in one member/client and removing it from another one. |
|||
Hazelcast’s specification-compliant JCache implementation. |
|||
For building reliable eventing systems. |
|||
A distributed and concurrent collection that contains no duplicate elements and does not preserve their order. |
|||
Similar to Hazelcast set, except a list allows duplicate elements and preserves their order. |
|||
A specialized Hazelcast map. It is a distributed data structure where you can store multiple values for a single key. |
|||
Similar to Hazelcast topic, except reliable topics are backed up by the ringbuffer data structure. |
|||
A data structure that implements Flajolet’s HyperLogLog algorithm. |
|||
A lock that provides exclusive access to a shared resource; only one thread at a time can acquire the lock and all access to the shared resource requires that the lock be acquired first. |
|||
A data structure that creates permits that control the thread counts when performing concurrent activities. |
|||
A data structure for dealing with |
|||
A data structure for dealing with a reference in a distributed environment. |
|||
A distributed gate keeper for concurrent activities; it enables the threads to wait for other threads to complete their operations. |
|||
A data structure for generating cluster-wide unique identifiers. |
|||
A data structure where each Hazelcast instance can increment and decrement the counter value and these updates are propagated to all replicas. |
|||
A data structure that stores the history of mutation actions on map or cache data structures. |
Partitioned vs Non-Partitioned
Hazelcast data structures use one of the following partitioning strategies:
-
Partitioned: Each partition stores a part of the whole data structure.
-
Non-partitioned: A single partition stores the whole data structure.
For details about how Hazelcast partitions data, see Hazelcast’s Replication Algorithm
Controlling Partitions
Hazelcast uses the name of a distributed object to determine in which partition it will be put.
In this example, the queues have different names, so they will be placed into different partitions:
HazelcastInstance hazelcastInstance = Hazelcast.newHazelcastInstance();
IQueue q1 = hazelcastInstance.getQueue("q1");
IQueue q2 = hazelcastInstance.getQueue("q2");
If you want to put these two into the same partition, you use the @
symbol:
HazelcastInstance hazelcastInstance = Hazelcast.newHazelcastInstance();
IQueue q1 = hazelcastInstance.getQueue("q1@foo");
IQueue q2 = hazelcastInstance.getQueue("q2@foo");
Now, these two queues will be put into the same partition whose partition key is foo
.
Note that you can use the getPartitionKey()
method to learn the partition key of a distributed object.
It may be useful when you want to create an object in the same partition as an existing object:
String partitionKey = q1.getPartitionKey();
IQueue q3 = hazelcastInstance.getQueue("q3@"+partitionKey);
Loading Objects
Hazelcast offers a get()
method for most of its distributed objects. To load an object,
first create a Hazelcast instance and then use the related get()
method on this instance.
The following code snippet creates a Hazelcast member and a map:
HazelcastInstance hazelcastInstance = Hazelcast.newHazelcastInstance();
Map<Integer, String> customers = hazelcastInstance.getMap( "customers" );
Destroying Objects
To destroy a Hazelcast distributed object, you can use the destroy()
method, which clears
and releases all resources of the object. Therefore, you must use it with care since a reload
with the same object reference after the object is destroyed creates a new data structure without
an error. See the following example code where one of the queues are destroyed and the other
one is accessed.
HazelcastInstance hz1 = Hazelcast.newHazelcastInstance();
HazelcastInstance hz2 = Hazelcast.newHazelcastInstance();
IQueue<String> q1 = hz1.getQueue("q");
IQueue<String> q2 = hz2.getQueue("q");
q1.add("foo");
System.out.println("q1.size: "+q1.size()+ " q2.size:"+q2.size());
q1.destroy();
System.out.println("q1.size: " + q1.size() + " q2.size:" + q2.size());
If you start the member, you see the following output:
q1.size: 1 q2.size:1
q1.size: 0 q2.size:0
As you see, no error is generated and a new queue resource is created.
Hazelcast is designed to create any distributed data structure whenever it is accessed.
Therefore, keep in mind that a data structure is recreated when you perform an operation on it even after you have destroyed it.
Configuring Data Structures
As to the configuration of distributed objects, Hazelcast uses the default settings
from the hazelcast.xml
file that comes with your Hazelcast member download. Of course,
you can provide an explicit configuration in this file or programmatically according
to your needs. See the Understanding Configuration section.
Note that, most of Hazelcast’s distributed objects are created lazily: A distributed object is created once the first operation accesses it.
If you want to use an object you loaded in other places, you can safely reload it using its reference without creating a new Hazelcast instance.
AP Data Structures
AP data structures prefer availability over consistency.
When a partition occurs, all members remain available but some might return an older version of data than others. When the partition is resolved, the members usually resynchronize to repair any inconsistencies:
-
If a member goes down, its backup replica (which holds the same data) dynamically redistributes the data, including the ownership and locks on them, to the remaining live members. As a result, there will not be any data loss.
-
There is no single cluster master that can be a single point of failure. Every member in the cluster has equal rights and responsibilities. No single member is superior.
Example Code
Here is an example of how you can retrieve existing AP data structure instances and how you can listen for instance events, such as an instance being created or destroyed.
public class ExampleDOL implements DistributedObjectListener {
public static void main(String[] args) {
ExampleDOL example = new ExampleDOL();
Config config = new Config();
HazelcastInstance hazelcastInstance = Hazelcast.newHazelcastInstance(config);
hazelcastInstance.addDistributedObjectListener(example);
Collection<DistributedObject> distributedObjects = hazelcastInstance.getDistributedObjects();
for (DistributedObject distributedObject : distributedObjects) {
System.out.println(distributedObject.getName());
}
}
@Override
public void distributedObjectCreated(DistributedObjectEvent event) {
DistributedObject instance = event.getDistributedObject();
System.out.println("Created " + instance.getName());
}
@Override
public void distributedObjectDestroyed(DistributedObjectEvent event) {
Object objectName = event.getObjectName();
System.out.println("Destroyed " + objectName);
}
}
The getDistributedObjects() method returns internal API objects
with prefixes, including __sql and __jet , such as __sql.catalog . You can ignore these objects.
|
CP Data Structures
A CP data structure delivers consistency and partition tolerance at the expense of availability. When a partition occurs between any two CP members, the non-consistent CP member must be shut down until the partition is resolved.
To learn about CP members and how Hazelcast delivers consistency and partition tolerance, see CP Subsystem.
The CP data structures differ from AP data structures in two aspects:
-
An internal commit is performed on the
METADATA
CP group every time you fetch a CP data structure object. Hence, the callers should cache the returned objects. -
If you call the
destroy()
method on a CP data structure object, that data structure is terminated in the underlying CP group and cannot be reinitialized until the CP group is force-destroyed. For this reason, please make sure that you are completely done with a CP data structure before destroying it.
Example Code
CPSubsystem cpSubsystem = hazelcastInstance.getCPSubsystem();
IAtomicLong atomicLong = cpSubsystem.getAtomicLong(name);
IAtomicReference atomicRef = cpSubsystem.getAtomicReference(name);
FencedLock lock = cpSubsystem.getLock(name);
ISemaphore semaphore = cpSubsystem.getSemaphore(name);
ICountDownLatch latch = cpSubsystem.getCountDownLatch(name);