Frequently Asked Questions

What guarantees does Hazelcast offer?

Hazelcast is distributed and highly available by nature. This is achieved by always keeping the data partition backup on another Hazelcast member.

Hazelcast offers AP and CP functionality with different data structure implementations (see CAP theorem). Data structures exposed under HazelcastInstance API are all AP data structures. Hazelcast also contains a CP subsystem, built on the Raft consensus algorithm and accessed via HazelcastInstance.getCPSubsystem() which provides CP data structures and APIs.

AP Hazelcast guarantees:
- With lazy replication, when the primary replica receives an update operation for a key, it executes the update locally and propagates it to backup replicas. It marks each update with a logical timestamp so that backups apply them in the correct order and converge to the same state with the primary. Backup replicas can be used to scale reads with no strong consistency but monotonic reads guarantee. For details, see Making Your Map Data Safe.
- It employs additional measurements to maintain consistency in a best-effort manner.
- Hazelcast, as an AP product, does not provide the exactly-once guarantee. In general, Hazelcast tends to be an at-least-once solution.

See the Consistency and Replication Model chapter for more information.

CP Hazelcast guarantees:
- It builds a strongly consistent layer for a set of distributed data structures. You can enable the CP Subsystem and use it with the strong consistency guarantee.
- Its data structures are CP with respect to the CAP principle, i.e., they always maintain linearizability and prefer consistency over availability during network partitions.
- Besides network partitions, the CP Subsystem withstands server and client failures.
- It provides a good degree of fault tolerance at run-time, and CP Subsystem Persistence enables more robustness.

See the CP Subsystem chapter for more information.

Why 271 as the default partition count?

The partition count of 271, being a prime number, is a good choice because it is distributed to the members almost evenly. For a small to medium-sized cluster, the count of 271 gives an almost even partition distribution and optimal-sized partitions. As your cluster becomes bigger, you should make this count bigger to ensure evenly distributed partitions.

Is Hazelcast thread-safe?

Yes. All Hazelcast data structures are thread-safe.

How do members discover each other?

When a member is started in a cluster, it is dynamically and automatically discovered. The types of discovery are:

Discovery by TCP/IP: The first member created in the cluster (leader) forms a list of IP addresses of other joining members and sends this list to these members so the members will know each other.
Discovery on clouds: Hazelcast supports discovery on cloud platforms such as Azure and Consul.
Multicast discovery: The members in a cluster discover each other by multicast, by default. It is not recommended for production since UDP is often blocked in production environments and other discovery mechanisms are more definite.

Once members are discovered, all the communication between them is via TCP/IP.

For detailed information, see the Discovery Mechanisms section.

What happens when a member goes down?

Once a member is gone (crashes), the following happens:

First, the backups in other members are restored.
Then, data from these restored backups are recovered.
And finally, new backups for these recovered data are formed.

So eventually, availability of the data is maintained.

How do I test the connectivity?

If you notice that there is a problem with a member joining a cluster, you may want to perform a connectivity test between the member to be joined and a member from the cluster. You can use the iperf tool for this purpose. For example, you can execute the below command on one member (i.e. listening on port 5701).

iperf -s -p 5701

And you can execute the below command on the other member.

iperf -c <IP address> -d -p 5701

The output should include connection information, such as the IP addresses, transfer speed and bandwidth. Otherwise, if the output says No route to host, it means a network connection problem exists.

How do I choose keys properly?

When you store a key and value in a distributed map, Hazelcast serializes the key and value and stores the byte array version of them in local ConcurrentHashMaps. These ConcurrentHashMaps use equals and hashCode methods of byte array version of your key. It does not take into account the actual equals and hashCode implementations of your objects. So it is important that you choose your keys in an appropriate way.

Implementing equals and hashCode is not enough, and it is important that the object is always serialized into the same byte array. All primitive types like String, Long, Integer, etc. are good candidates for keys to be used in Hazelcast. An unsorted set is an example of a bad candidate because Java Serialization may serialize the same unsorted set in two different byte arrays.

How do I reflect value modifications?

Hazelcast always returns a clone copy of a value. Modifying the returned value does not change the actual value in the map (or multimap, list, set). You should put the modified value back to make changes visible to all members.

V value = map.get( key );
value.updateSomeProperty();
map.put( key, value );

Collections which return values of methods (such as IMap.keySet, IMap.values, IMap.entrySet, MultiMap.get, MultiMap.remove, IMap.keySet, IMap.values) contain cloned values. These collections are NOT backed up by related Hazelcast objects. Therefore, changes to them are NOT reflected in the originals and vice-versa.

How do I test my Hazelcast cluster?

Hazelcast allows you to create more than one instance on the same JVM. Each member is called HazelcastInstance and each has its own configuration, socket and threads, so you can treat them as totally separate instances.

This enables you to write and to run cluster unit tests on a single JVM. Because you can use this feature for creating separate members with different applications running on the same JVM (imagine running multiple web applications on the same JVM), you can also use this feature for testing your Hazelcast cluster.

For example, let’s say you want to test if two members have the same size of a map:

@Test
public void testTwoMemberMapSizes() {
  // start the first member
  HazelcastInstance h1 = Hazelcast.newHazelcastInstance();
  // get the map and put 1000 entries
  Map map1 = h1.getMap( "testmap" );
  for ( int i = 0; i < 1000; i++ ) {
    map1.put( i, "value" + i );
  }
  // check the map size
  assertEquals( 1000, map1.size() );
  // start the second member
  HazelcastInstance h2 = Hazelcast.newHazelcastInstance();
  // get the same map from the second member
  Map map2 = h2.getMap( "testmap" );
  // check the size of map2
  assertEquals( 1000, map2.size() );
  // check the size of map1 again
  assertEquals( 1000, map1.size() );
}

In the test above, everything happens in the same thread. When developing a multithreaded test, you need to carefully handle coordination of the thread executions. It is highly recommended that you use CountDownLatch for thread coordination (although you can use other ways). Here is an example where we need to listen for messages and make sure that we got these messages.

@Test
public void testTopic() {
  // start two member cluster
  HazelcastInstance h1 = Hazelcast.newHazelcastInstance();
  HazelcastInstance h2 = Hazelcast.newHazelcastInstance();
  String topicName = "TestMessages";
  // get a topic from the first member and add a messageListener
  ITopic<String> topic1 = h1.getTopic( topicName );
  final CountDownLatch latch1 = new CountDownLatch( 1 );
  topic1.addMessageListener( new MessageListener() {
    public void onMessage( Object msg ) {
      assertEquals( "Test1", msg );
      latch1.countDown();
    }
  });
  // get a topic from the second member and add a messageListener
  ITopic<String> topic2 = h2.getTopic(topicName);
  final CountDownLatch latch2 = new CountDownLatch( 2 );
  topic2.addMessageListener( new MessageListener() {
    public void onMessage( Object msg ) {
      assertEquals( "Test1", msg );
      latch2.countDown();
    }
  } );
  // publish the first message, both should receive this
  topic1.publish( "Test1" );
  // shutdown the first member
  h1.shutdown();
  // publish the second message, second member's topic should receive this
  topic2.publish( "Test1" );
  try {
    // assert that the first member's topic got the message
    assertTrue( latch1.await( 5, TimeUnit.SECONDS ) );
    // assert that the second members' topic got two messages
    assertTrue( latch2.await( 5, TimeUnit.SECONDS ) );
  } catch ( InterruptedException ignored ) {
  }
}

You can start Hazelcast members with different configurations. Remember to call Hazelcast.shutdownAll() after each test case to make sure that there is no other running member left from the previous tests.

@After
public void cleanup() throws Exception {
  Hazelcast.shutdownAll();
}

For more information, check our existing tests.

Does Hazelcast support hundreds of members?

Yes. Hazelcast performed a successful test on Amazon EC2 with 200 members.

Does Hazelcast support thousands of clients?

Yes. However, there are some points you should consider. The environment should be LAN with a high stability and the network speed should be 10 Gbps or higher. If the number of members is high, you can select the client type as SINGLE_MEMBER or MULTI_MEMBER, but not ALL_MEMBERS. In the case of Smart Clients, since each client opens a connection to the members, these members should be powerful enough (for example, more cores) to handle hundreds or thousands of connections and client requests. Also, you should consider using Near Caches in clients to lower the network traffic. And you should use the Hazelcast releases with the NIO implementation (which starts with Hazelcast 3.2).

Also, you should configure the clients attentively. See the Clients section for information about configuration.

Difference between lite member and smart client?

Lite member supports task execution (distributed executor service); smart client does not. Also, lite member is highly coupled with cluster; smart client is not. Starting with Hazelcast 3.9, you can also promote lite members to data members. See the Lite Members section for more information.

Does Hazelcast persist?

Yes. Hazelcast offers several optional mechanisms to persist data across restarts and failures:

Persistence: Hazelcast provides a Persistence feature that allows data to be persisted to disk. This feature can be enabled to save data for recovery after planned cluster shutdowns, unplanned single node failures, or unplanned cluster-wide failures.
MapStore and MapLoader: the MapStore and MapLoader interfaces allow you to implement custom persistence logic to store and load data from external systems.
File-based backups: Hazelcast offers features for backing up in-memory maps to files on local disk, persistent memory, or external databases. This includes the Persistence feature and MapStore.
Persistence for specific data structures: You can configure persistence for specific data structures like maps, job snapshots, and SQL metadata. This allows for data recovery in various scenarios including planned shutdowns and unplanned failures.
CP Subsystem Persistence: The CP Subsystem is designed to provide strong consistency and partition tolerance. This means that during network partitions, the system prioritizes consistency over availability, ensuring that all nodes have the most up-to-date and consistent data. The CP Persistence feature enhances the reliability of the CP Subsystem and ensures that the system can recover from failures without data loss.

It’s important to note that these persistence mechanisms are not enabled by default. Hazelcast focuses on in-memory operations for performance, with these persistence options available for scenarios where data durability across restarts or failures is required.

Can I use Hazelcast in a single server?

Yes. But note that Hazelcast’s main design focus is multi-member clusters to be used as a distribution platform.

How can I monitor Hazelcast?

Management Center is what you use to monitor and manage the members running Hazelcast. In addition to monitoring the overall state of a cluster, you can analyze and browse data structures in detail, you can update map configurations and you can take thread dumps from members.

You can also use Hazelcast’s HTTP based health check implementation and health monitoring utility. See the Health Check and Monitoring section. There is also a diagnostics tool where you can see detailed logs enhanced with diagnostic plugins.

Moreover, JMX monitoring is also provided. See the Monitoring with JMX section for details.

How can I see debug level logs?

By changing the log level to "Debug". Below are example lines for log4j logging framework. See the Logging Configuration section to learn how to set logging types.

First, set the logging type as follows.

String location = "log4j.configuration";
String logging = "hazelcast.logging.type";
System.setProperty( logging, "log4j" );
/**if you want to give a new location. **/
System.setProperty( location, "file:/path/mylog4j.properties" );

Then set the log level to "Debug" in the properties file. Below is example content.

# direct log messages to stdout #

log4j.appender.stdout=org.apache.log4j.ConsoleAppender

log4j.appender.stdout.Target=System.out

log4j.appender.stdout.layout=org.apache.log4j.PatternLayout

log4j.appender.stdout.layout.ConversionPattern=%d{ABSOLUTE} %5p [%c{1}] - %m%n

log4j.logger.com.hazelcast=debug

#log4j.logger.com.hazelcast.cluster=debug

#log4j.logger.com.hazelcast.partition=debug

#log4j.logger.com.hazelcast.partition.InternalPartitionService=debug

#log4j.logger.com.hazelcast.nio=debug

#log4j.logger.com.hazelcast.hibernate=debug

The line log4j.logger.com.hazelcast=debug is used to see debug logs for all Hazelcast operations. Below this line, you can select to see specific logs (cluster, partition, hibernate, etc.).

Client-server vs. embedded topologies?

In the embedded topology, members include both the data and application. This type of topology is the most useful if your application focuses on high performance computing and many task executions. Since application is close to data, this topology supports data locality.

In the client-server topology, you create a cluster of members and scale the cluster independently. Your applications are hosted on the clients and the clients communicate with the members in the cluster to reach data.

Client-server topology fits better if there are multiple applications sharing the same data or if application deployment is significantly greater than the cluster size (for example, 500 application servers vs. 10 member cluster).

How can I shut down a Hazelcast member?

You can use the following ways to shut down a Hazelcast member:

You can call kill -9 <PID> in the terminal (which sends a SIGKILL signal). This results in immediate shutdown which is not recommended for production systems. If you set the property hazelcast.shutdownhook.enabled to false and then kill the process using kill -15 <PID>, its result is the same (immediate shutdown).
You can call kill -15 <PID> in the terminal (which sends a SIGTERM signal), or you can call the method HazelcastInstance.getLifecycleService().terminate() programmatically, or you can use the script stop.sh located in your Hazelcast’s /bin directory. All three of them terminate your member ungracefully. They do not wait for migration operations; they force the shutdown. But this is much better than kill -9 <PID> since it releases most of the used resources.
To gracefully shut down a Hazelcast member (so that it waits for the migration operations to be completed), you have four options. You can:
- Call the method HazelcastInstance.shutdown() programmatically.
- Use JMX API’s shutdown method. You can do this by implementing a JMX client application or using a JMX monitoring tool (like JConsole).
- Set the property hazelcast.shutdownhook.policy to GRACEFUL and then shutdown by using kill -15 <PID>. Your member will be gracefully shutdown.
- Use the "Shutdown Member" button in the member view of Management Center.

If you use systemd’s systemctl utility, i.e., systemctl stop service_name, a SIGTERM signal is sent. After 90 seconds of waiting it is followed by a SIGKILL signal by default. Thus, it calls terminate at first and kills the member directly after 90 seconds. We do not recommend using it with its defaults. But systemd is very customizable and well-documented — you can see its details using the command man systemd.kill. If you can customize it to shutdown your Hazelcast member gracefully (by using the methods above), then you can use it.

How do I know it is safe to kill the second member?

Starting with Hazelcast 3.7, graceful shutdown of a Hazelcast member can be initiated any time as follows:

hazelcastInstance.shutdown();

Once a Hazelcast member initiates a graceful shutdown, data of the shutting down member is migrated to the other members automatically.

However, there is no such guarantee for termination.

The following code snippet terminates a member if the cluster is safe, which means that there are no partitions being migrated and all backups are in sync when this method is called.

PartitionService partitionService = hazelcastInstance.getPartitionService();
if (partitionService.isClusterSafe()) {
  hazelcastInstance.getLifecycleService().terminate();
}

The following code snippet terminates the local member if the member is safe to terminate, which means that all backups of partitions currently owned by local member are in sync when this method is called.

PartitionService partitionService = hazelcastInstance.getPartitionService();
if (partitionService.isLocalMemberSafe()) {
  hazelcastInstance.getLifecycleService().terminate();
}

Keep in mind that the two code snippets above are inherently racy. If member failures occur in the cluster after the safety condition check passes, termination of the local member can lead to data loss. For safety of the data, graceful shutdown API is highly recommended.

See the Safety Checking Cluster Members section for more information.

When do I need Native Memory solutions?

Native Memory solutions can be preferred when:

the amount of data per member is large enough to create significant garbage collection pauses.
your application requires predictable latency.

Is there any disadvantage to using near cache?

The only disadvantage when using near cache is that it may cause stale reads.

Is Hazelcast secure?

Hazelcast supports symmetric encryption, transport layer security/secure sockets layer (TLS/SSL) and Java Authentication and Authorization Service (JAAS). See the Security chapter for more information.

The symmetric encryption feature has been deprecated. You can use the TLS/SSL protocol to establish an encrypted communication across your Hazelcast cluster.

How can I set socket options?

Hazelcast allows you to set some socket options such as SO_KEEPALIVE, SO_SNDBUF and SO_RCVBUF using Hazelcast configuration properties. See the hazelcast.socket.* properties explained in the System Properties appendix.

Client disconnections during idle time?

In Hazelcast, socket connections are created with the SO_KEEPALIVE option enabled by default. In most operating systems, the default keep-alive time is 2 hours. If you have a firewall between clients and servers which is configured to reset idle connections/sessions, make sure that the firewall’s idle timeout is greater than the TCP keep-alive defined in the OS.

See Using TCP keepalive under Linux and Microsoft Windows for additional information.

OOME: Unable to create new native thread?

If you encounter an error of java.lang.OutOfMemoryError: unable to create new native thread, it may be caused by exceeding the available file descriptors on your operating system, especially if it is Linux. This exception is usually thrown on a running member, after a period of time when the thread count exhausts the file descriptor availability.

The JVM on Linux consumes a file descriptor for each thread created. The default number of file descriptors available in Linux is usually 1024. If you have many JVMs running on a single machine, it is possible to exceed this default number.

You can view the limit using the following command.

# ulimit -a

At the operating system level, Linux users can control the amount of resources (and in particular, file descriptors) used via one of the following options.

1 - Editing the limits.conf file:

# vi /etc/security/limits.conf

testuser soft nofile 4096<br>
testuser hard nofile 10240<br>

2 - Or using the ulimit command:

# ulimit -Hn

The default number of process per users is 1024. Adding the following to your $HOME/.profile could solve the issue:

# ulimit -u 4096

Does repartitioning wait for Entry Processor?

Repartitioning is the process of redistributing the partition ownerships. Hazelcast performs the repartitioning in the cases where a member leaves the cluster or joins the cluster. If a repartitioning happens while an entry processor is active in a member processing on an entry object, the repartitioning waits for the entry processor to complete its job.

Instances on different machines cannot see each other?

Assume you have two instances on two different machines and you develop a configuration as shown below.

Config config = new Config();
NetworkConfig network = config.getNetworkConfig();

JoinConfig join = network.getJoin();
join.getTcpIpConfig().addMember("IP1")
    .addMember("IP2").setEnabled(true);
network.getInterfaces().setEnabled(true)
    .addInterface("IP1").addInterface("IP2");

When you create the Hazelcast instance, you have to pass the configuration to the instance. If you create the instances without passing the configuration, each instance starts but cannot see each other. Therefore, a correct way to create the instance is as follows:

HazelcastInstance instance = Hazelcast.newHazelcastInstance(config);

The following is an incorrect way:

HazelcastInstance instance = Hazelcast.newHazelcastInstance();

What does "Replica: 1 has no owner" mean?

When you start more members after the first one is started, you will see a replica: 1 has no owner entry in the newly started member’s log. This is as expected since it refers to a transitory state. It means the replica partition is not ready/assigned yet, and eventually will be.