High-Density Memory Store
By default, data structures in Hazelcast store data on heap in serialized form for highest data compaction. However, these data structures are still subject to Java Garbage Collection (GC). Modern hardware has much more available memory. If you want to make use of that hardware and scale up by specifying higher heap sizes, GC becomes an increasing problem. The application faces long GC pauses that make the application unresponsive. Also, you may get out of memory errors if you fill your whole heap. Garbage collection, which is the automatic process that manages the application’s runtime memory, often forces you into configurations where multiple JVMs with small heaps (sizes of 2-4GB per member) run on a single physical hardware device to avoid garbage collection pauses. This results in oversized clusters to hold the data and leads to performance issues.
In Hazelcast Enterprise, the High-Density Memory Store is Hazelcast’s enterprise in-memory storage solution. It solves garbage collection limitations so that applications can exploit hardware memory more efficiently without the need of oversized clusters. High-Density Memory Store is designed as a pluggable memory manager which enables multiple memory stores for different data structures. These memory stores are all accessible by a common access layer that scales up to massive amounts of the main memory on a single JVM by minimizing the GC pressure. High-Density Memory Store enables predictable application scaling and boosts performance and latency while minimizing garbage collection pauses.
This foundation includes, but is not limited to, storing keys and values next to the heap in a native memory region.
High-Density Memory Store is currently provided for the following Hazelcast features and implementations:
Java Client, when using the Near Cache for client
Paging and Partition Predicates
Configuring High-Density Memory Store
To use the High-Density memory storage, the native memory usage must be enabled using the programmatic or declarative configuration. Also, you can configure its size, memory allocator type, minimum block size, page size and metadata space percentage.
The following are the configuration element descriptions:
size: Size of the total native memory to allocate in megabytes. Its default value is 512 MB.
allocator type: Type of the memory allocator. Available values are as follows:
STANDARD: This option is used internally by Hazelcast’s POOLED allocator type or for debugging/testing purposes.
With this option, the memory is allocated or deallocated using your operating system’s default memory manager.
It uses GNU C Library’s standard
free()methods which are subject to contention on multithreaded/multicore systems.
Memory operations may become slower when you perform a lot of small allocations and deallocations.
It may cause large memory fragmentations, unless you use a method in the background that emphasizes fragmentation avoidance, such as
jemalloc(). Note that a large memory fragmentation can trigger the Linux Out of Memory Killer if there is no swap space enabled in your system. Even if the swap space is enabled, the killer can be again triggered if there is not enough swap space left.
If you still want to use the operating system’s default memory management, you can set the allocator type to STANDARD in your native memory configuration.
POOLED: This is the default option, Hazelcast’s own pooling memory allocator.
With this option, memory blocks are managed using internal memory pools.
It allocates memory blocks, each of which has a 4MB page size by default, and splits them into chunks or merges them to create larger chunks when required. Sizing of these chunks follows the buddy memory allocation algorithm, i.e., power-of-two sizing.
It never frees memory blocks back to the operating system. It marks disposed memory blocks as available to be used later, meaning that these blocks are reusable.
Memory allocation and deallocation operations (except the ones requiring larger sizes than the page size) do not interact with the operating system mostly.
For memory allocation, it tries to find the requested memory size inside the internal memory pools. If it cannot be found, then it interacts with the operating system.
minimum block size: Minimum size of the blocks in bytes to split and fragment a page block to assign to an allocation request. It is used only by the POOLED memory allocator. Its default value is 16 bytes.
page size: Size of the page in bytes to allocate memory as a block. It is used only by the POOLED memory allocator. Its default value is
1 << 22= 4194304 Bytes, about 4 MB.
metadata space percentage: Defines the percentage of the allocated native memory that is used for internal memory structures by the High-Density Memory for tracking the used and available memory blocks. It is used only by the POOLED memory allocator. Its default value is 12.5. Please note that when the memory runs out, you get a
NativeOutOfMemoryException; if your store has a large number of entries, you should consider increasing this percentage.
persistent-memory: See Using the High-Density Memory Store with Persistent Memory Devices.
The following is the programmatic configuration example.
MemorySize memorySize = new MemorySize(512, MemoryUnit.MEGABYTES); NativeMemoryConfig nativeMemoryConfig = new NativeMemoryConfig() .setAllocatorType(NativeMemoryConfig.MemoryAllocatorType.POOLED) .setSize(memorySize) .setEnabled(true) .setMinBlockSize(16) .setPageSize(1 << 20);
The following is the declarative configuration example.
<hazelcast> ... <native-memory allocator-type="POOLED" enabled="true"> <size unit="MEGABYTES" value="512"/> <min-block-size>16</min-block-size> <page-size>4194304</page-size> <metadata-space-percentage>12.5</metadata-space-percentage> <persistent-memory> <directories> <directory numa-node="0">/mnt/pmem0</directory> <directory numa-node="1">/mnt/pmem1</directory> </directories> </persistent-memory> </native-memory> ... </hazelcast>
hazelcast: native-memory: enabled: true allocator-type: POOLED size: unit: MEGABYTES value: 512 min-block-size: 16 page-size: 4194304 metadata-space-percentage: 12.5 persistent-memory: directories: - directory: /mnt/pmem0 numa-node: 0 - directory: /mnt/pmem1 numa-node: 1
You can check whether there is enough free physical memory for the
requested number of bytes using the system property
Using the High-Density Memory Store with Persistent Memory Devices
To extend the memory available to the High-Density Memory Store, you can use persistent memory devices, such as Intel® Optane™ DC. This is a cost-efficient way to provide additional storage for data structures like IMap, ICache, and Near Cache.
Importantly, the High-Density Memory Store uses the memory provided by the persistent memory device as volatile memory. This means that all data stored on the device is lost when a Hazelcast member or cluster is restarted. To recover data from individual members or clusters after planned or unplanned shutdowns, you need to persist data on disk.
To use a persistent memory device, you don’t need to make any changes to your application code. The following example shows you how to configure the dual in-line memory modules (DIMMs) of Intel® Optane™ DC as the High-Density Memory Store in Hazelcast.
|Although the example describes the configuration of a dual socket machine, this is not a requirement for using persistent memory devices.|
Configuring Intel® Optane™
Linux x86_64 operating system
Dual socket machine with both sockets populated with Intel® Optane™ DC DIMMs configured in interleaved mode
DIMMs mounted as
/mnt/pmem1, and known as NUMA node0 and node1 respectively
|Persistent memory devices running on Linux x86_64 is the only configuration currently supported.|
Example Configuration Elements and Attributes
persistent-memory element in the
native-memory configuration block enables the use of the persistent memory device and defines the directories where this memory is mounted along with its operational mode.
<hazelcast> ... <native-memory allocator-type="POOLED" enabled="true"> <size unit="GIGABYTES" value="100" /> <persistent-memory enabled="true" mode="MOUNTED"> <directories> <directory numa-node="0">/mnt/pmem0</directory> <directory numa-node="1">/mnt/pmem1</directory> </directories> </persistent-memory> </native-memory> ... </hazelcast>
hazelcast: native-memory: enabled: true allocator-type: POOLED size: unit: GIGABYTES value: 100 persistent-memory: enabled: true mode: MOUNTED directories: - directory: /mnt/pmem0 numa-node: 0 - directory: /mnt/pmem1 numa-node: 1
Config config = new Config(); NativeMemoryConfig memoryConfig = new NativeMemoryConfig() .setEnabled(true) .setSize(new MemorySize(100, MemoryUnit.GIGABYTES)) .setAllocatorType(POOLED); PersistentMemoryConfig pmemConfig = memoryConfig.getPersistentMemoryConfig() .setEnabled(true) .setMode(MOUNTED) .addDirectoryConfig(new PersistentMemoryDirectoryConfig("/mnt/pmem0", 0)) .addDirectoryConfig(new PersistentMemoryDirectoryConfig("/mnt/pmem1", 1)); config.setNativeMemoryConfig(memoryConfig);
The following elements and attributes are also used in the configuration:
enabled: Specifies whether use of memory on the persistent memory device is enabled or not. The default value is
false, that is use of the device is disabled.
mode: Defines the operational mode of the persistent memory device. Two modes are supported:
MOUNTED: If you choose this mode, the memory is mounted into the file system (aka FS DAX).
SYSTEM_MEMORY: If you choose this mode, Hazelcast uses the already onlined persistent memory portion of the system memory (aka KMEM DAX).
directories: A list of the mounted directories for the persistent memory device which are available for the storage of all data structures backed by the High-Density Memory Store. When you specify the mounted directories, the following rules apply:
Use of memory on the persistent memory device is enabled automatically, you do not need to explicitly set the
If you don’t specify a directory for the persistent memory device, standard RAM is used instead.
Since on multi-socket machines there could be multiple mount points for persistent memory devices, the memory allocations need to follow an allocation strategy. Starting with 4.1, Hazelcast supports two allocation strategies:
Round-robin allocation strategy
NUMA-aware allocation strategy
Hazelcast’s memory allocator chooses and statically caches one of them for every allocator thread for the entire lifetime of the Hazelcast instance.
Round-robin Allocation Strategy
Hazelcast iterates over the configured memory directories and makes sure that every allocation is done in a different directory from the last. This is a best-effort attempt to distribute the allocations evenly on the memory modules, which is also important for memory utilization and performance. This is the default allocation strategy.
NUMA-aware Allocation Strategy
The persistent memory modules are mounted in memory slots just like the regular memory modules, and share the same memory bus. Therefore, the same NUMA locality concerns apply to the memory from a persistent memory device as regular memory. It is cheaper to access memory modules attached to the socket on which the current thread runs than from a different socket. The modules are typically referenced as NUMA-local and NUMA-remote memories. To achieve the best possible performance, Hazelcast implements a NUMA-aware allocation strategy to ensure that all accesses to memory modules are local, if certain conditions hold.
To enable this allocation strategy for a certain thread, the thread must be bounded to a single NUMA node, which means the kernel’s scheduler makes sure that the thread can only be scheduled on the CPUs of a single NUMA node. Starting with Hazelcast 4.1, this can be done with thread group granularity. For a detailed explanation, see CPU Thread Affinity.
Enabling the NUMA-aware allocation strategy for the operation threads can make the biggest impact on performance. For example:
This configuration restricts all 40 operation threads to run on a single NUMA node on a dual-socket 40 core system, where:
node0’s CPU set is
node1’s CPU set is
To discover the NUMA nodes and their CPU sets, use the
numactl -H command.
The second requirement for the NUMA-aware strategy is defining the NUMA node for every directory configured for the persistent memory device.
If both parts of the configuration are completed, the threads in the thread groups restricted to run on a single NUMA node will use the NUMA-aware allocation strategy. The rest of threads will still use the round-robin strategy. To identify the memory attached to a NUMA node, use the
ndctl list -v -m fsdax command. From the output of
ndctl you can you can check which mount point represents which persistent memory device.
Since both allocation strategies try to allocate from a single memory directory, lack of capacity can prevent the chosen directory from serving the allocation request. In this case, both strategies take the other directories and try to serve the allocation from those. This fallback option compromises the NUMA-aware strategy as there will be NUMA-remote accesses to memory on the persistent memory device.
Performance of Memory Modules
While the memory modules of a persistent memory device are mounted next to the regular memory modules, and share the same memory bus, they have different performance characteristics.
Memory modules on the persistent memory device can be accessed with higher latency than the regular memory modules.
Performance of reads and writes on regular memory modules are the same. On a persistent memory device, the memory modules have an asymmetric performance profile, which means that the writes are slower than the reads.
Despite the above facts, whether the higher latency of the memory from persistent memory devices impacts the performance of Hazelcast depends on multiple factors.
Performance Impacts on Different Use Cases
Hazelcast is a distributed platform so the higher latency of memory on a persistent memory device can easily be hidden by the latency variance of the network. For certain use cases there may be no observable difference in the throughput whether Hazelcast stores its data on memory from a persistent memory device or on regular memory. An example of this is caching, where accessing the entries remotely through Hazelcast clients results in a very similar throughput. Based on our tests with Intel® Optane™ DC persistent memory modules, we recommend Optane modules for the caching use case up to a 10 KB entry size.
Other use cases that don’t involve networking, such as iterating over all entries with entry processors can be impacted by the higher latency of the memory modules on a persistent memory device, especially if the entry processors update a significant portion of the entries. For this type use case, the higher the entry size, the higher the impact on the performance. That means with smaller entry sizes the performance of Hazelcast with memory modules on a persistent memory device can be comparable to the performance with regular memory.