High-Density Memory Store
Hazelcast IMDG Enterprise HD
By default, data structures in Hazelcast store data on heap in serialized form for highest data compaction; yet, these data structures are still subject to Java Garbage Collection (GC). Modern hardware has much more available memory. If you want to make use of that hardware and scale up by specifying higher heap sizes, GC becomes an increasing problem: the application faces long GC pauses that make the application unresponsive. Also, you may get out of memory errors if you fill your whole heap. Garbage collection, which is the automatic process that manages the application’s runtime memory, often forces you into configurations where multiple JVMs with small heaps (sizes of 2-4GB per member) run on a single physical hardware device to avoid garbage collection pauses. This results in oversized clusters to hold the data and leads to performance level requirements.
In Hazelcast IMDG Enterprise HD, the High-Density Memory Store is Hazelcast’s new enterprise in-memory storage solution. It solves garbage collection limitations so that applications can exploit hardware memory more efficiently without the need of oversized clusters. High-Density Memory Store is designed as a pluggable memory manager which enables multiple memory stores for different data structures. These memory stores are all accessible by a common access layer that scales up to massive amounts of the main memory on a single JVM by minimizing the GC pressure. High-Density Memory Store enables predictable application scaling and boosts performance and latency while minimizing garbage collection pauses.
This foundation includes, but is not limited to, storing keys and values next to the heap in a native memory region.
High-Density Memory Store is currently provided for the following Hazelcast features and implementations:
To use the High-Density memory storage, the native memory usage must be enabled using the programmatic or declarative configuration. Also, you can configure its size, memory allocator type, minimum block size, page size and metadata space percentage.
The following are the configuration element descriptions:
size: Size of the total native memory to allocate in megabytes. Its default value is 512 MB.
allocator type: Type of the memory allocator. Available values are as follows:
STANDARD: This option is used internally by Hazelcast’s POOLED allocator type or for debugging/testing purposes.
With this option, the memory is allocated or deallocated using your operating system’s default memory manager.
It uses GNU C Library’s standard
free()methods which are subject to contention on multithreaded/multicore systems.
Memory operations may become slower when you perform a lot of small allocations and deallocations.
It may cause large memory fragmentations, unless you use a method in the background that emphasizes fragmentation avoidance, such as
jemalloc(). Note that a large memory fragmentation can trigger the Linux Out of Memory Killer if there is no swap space enabled in your system. Even if the swap space is enabled, the killer can be again triggered if there is not enough swap space left.
If you still want to use the operating system’s default memory management, you can set the allocator type to STANDARD in your native memory configuration.
POOLED: This is the default option, Hazelcast’s own pooling memory allocator.
With this option, memory blocks are managed using internal memory pools.
It allocates memory blocks, each of which has a 4MB page size by default, and splits them into chunks or merges them to create larger chunks when required. Sizing of these chunks follows the buddy memory allocation algorithm, i.e., power-of-two sizing.
It never frees memory blocks back to the operating system. It marks disposed memory blocks as available to be used later, meaning that these blocks are reusable.
Memory allocation and deallocation operations (except the ones requiring larger sizes than the page size) do not interact with the operating system mostly.
For memory allocation, it tries to find the requested memory size inside the internal memory pools. If it cannot be found, then it interacts with the operating system.
minimum block size: Minimum size of the blocks in bytes to split and fragment a page block to assign to an allocation request. It is used only by the POOLED memory allocator. Its default value is 16 bytes.
page size: Size of the page in bytes to allocate memory as a block. It is used only by the POOLED memory allocator. Its default value is
1 << 22= 4194304 Bytes, about 4 MB.
metadata space percentage: Defines the percentage of the allocated native memory that is used for internal memory structures by the High-Density Memory for tracking the used and available memory blocks. It is used only by the POOLED memory allocator. Its default value is 12.5. Please note that when the memory runs out, you get a
NativeOutOfMemoryException; if your store has a large number of entries, you should consider increasing this percentage.
persistent-memory: See the Using Persistent Memory section below.
The following is the programmatic configuration example.
MemorySize memorySize = new MemorySize(512, MemoryUnit.MEGABYTES); NativeMemoryConfig nativeMemoryConfig = new NativeMemoryConfig() .setAllocatorType(NativeMemoryConfig.MemoryAllocatorType.POOLED) .setSize(memorySize) .setEnabled(true) .setMinBlockSize(16) .setPageSize(1 << 20);
The following is the declarative configuration example.
<hazelcast> ... <native-memory allocator-type="POOLED" enabled="true"> <size unit="MEGABYTES" value="512"/> <min-block-size>16</min-block-size> <page-size>4194304</page-size> <metadata-space-percentage>12.5</metadata-space-percentage> <persistent-memory enabled="true" mode="MOUNTED"> <directories> <directory numa-node="0">/mnt/pmem0</directory> <directory numa-node="1">/mnt/pmem1</directory> </directories> </persistent-memory> </native-memory> ... </hazelcast>
hazelcast: native-memory: enabled: true allocator-type: POOLED size: unit: MEGABYTES value: 512 min-block-size: 16 page-size: 4194304 metadata-space-percentage: 12.5 persistent-memory: enabled: true mode: MOUNTED directories: - directory: /mnt/pmem0 numa-node: 0 - directory: /mnt/pmem1 numa-node: 1
You can check whether there is enough free physical memory for the
requested number of bytes using the system property
|The High-Density Memory Store uses the persistent memory in its volatile mode, which means all data is lost after the instance restarts. For durability, please check the Hot Restart Persistence feature.|
To support larger and more affordable storage for data structures like IMap, ICache and Near Cache, Hazelcast provides integration with persistent memory technologies like Intel® Optane™ DC. To benefit from the technology, you do not need to make any changes in your application code. Only a few configuration changes are required.
|Note that integration with Intel® Optane™ DC is supported on Linux operating system and it is for Optane DIMMs (not SSDs).|
persistent-memory element in the
native-memory configuration block
enables the persistent memory usage and defines the directories where this memory
is mounted along with its operational mode. See the element descriptions below the
following configuration snippets.
<hazelcast> ... <native-memory allocator-type="POOLED" enabled="true"> <size unit="GIGABYTES" value="100" /> <persistent-memory enabled="true" mode="MOUNTED> <directories> <directory numa-node="0">/mnt/pmem0</directory> <directory numa-node="1">/mnt/pmem1</directory> </directories> </persistent-memory> </native-memory> ... </hazelcast>
hazelcast: native-memory: enabled: true allocator-type: POOLED size: unit: GIGABYTES value: 100 persistent-memory: enabled: true mode: MOUNTED directories: - directory: /mnt/pmem0 numa-node: 0 - directory: /mnt/pmem1 numa-node: 1
Config config = new Config(); NativeMemoryConfig memoryConfig = new NativeMemoryConfig() .setEnabled(true) .setSize(new MemorySize(100, MemoryUnit.GIGABYTES)) .setAllocatorType(POOLED); PersistentMemoryConfig pmemConfig = memoryConfig.getPersistentMemoryConfig() .setEnabled(true) .setMode(MOUNTED) .addDirectoryConfig(new PersistentMemoryDirectoryConfig("/mnt/pmem0", 0)) .addDirectoryConfig(new PersistentMemoryDirectoryConfig("/mnt/pmem1", 1)); config.setNativeMemoryConfig(memoryConfig);
The above snippets demonstrate how to configure the persistent memory as
High-Density Memory Store in Hazelcast. The example assumes dual-socket machine;
both sockets are populated with Intel® Optane™ DC persistent memory DIMMs that
are configured in interleaved mode. The two sockets' DIMMs are mounted as
/mnt/pmem1 and are known as NUMA node0 and node1, respectively.
Here are the descriptions of the persistent memory configuration elements and attributes:
enabled: Specifies whether the persistent memory usage is enabled or not. Its default value is
false, i.e., persistent memory usage is disabled.
mode: Defines the persistent memory operational mode. Two modes are supported:
MOUNTED: If you choose this mode, the persistent memory is mounted into the file system (aka FS DAX).
SYSTEM_MEMORY: If you choose this mode, the persistent memory is onlined as system memory (aka KMEM DAX).
directories: List of the persistent memory mounting directories to store data of all the data structures backed by High-Density Memory Store. When you provide the directories:
the persistent memory usage is enabled automatically and you do not need to explicitly set the
modeshould be set to
If you do not provide a persistent memory directory is configured, standard RAM is used.
Since on multi-socket machines there could be multiple persistent memory mount points, the memory allocations need to follow an allocation strategy. Starting with 4.1, Hazelcast supports two allocation strategies:
Round-robin allocation strategy
NUMA-aware allocation strategy
Hazelcast’s memory allocator chooses and statically caches one of them for every allocator thread for the entire lifetime of the Hazelcast instance.
Hazelcast iterates over the configured persistent memory directories and makes sure every allocation is done in a different directory than the last. This is a best-effort attempt to distribute the allocations evenly on the persistent memory DIMMs, which is important from the utilization and performance points of view as well. This is the default allocation strategy.
The persistent memory modules are mounted in the memory slots just like the regular memory modules and sharing the same memory bus. Therefore, the same NUMA locality concerns apply to the persistent memory that apply to regular memory. This means accessing the persistent memory modules attached to the socket on which the current thread runs is cheaper than accessing the persistent memory modules attached to a different socket. These are typically referenced as NUMA-local and NUMA-remote memories. To achieve the best possible performance, Hazelcast implements a NUMA-aware allocation strategy to ensure all persistent memory accesses are local, if certain conditions hold.
To enable this allocation strategy for a certain thread, the thread has to be bounded to a single NUMA node, which means the kernel’s scheduler makes sure the thread can be scheduled only on the CPUs of a single NUMA node. Starting with Hazelcast 4.1 this can be done with thread group granularity. For the detailed explanation please refer to the thread affinity documentation. What makes the biggest impact on performance is enabling the NUMA-aware allocation strategy for the operation threads. An example configuration for that is as follows:
The above example configuration restricts all 40 operation threads to run on a single NUMA node
on a dual-socket 40 core system, where node0’s CPU set is
node1’s CPU set is
[10-19,30-39]. The NUMA nodes and their CPU sets can be
discovered by the
numactl -H command.
The second requirement for the NUMA-aware strategy is defining the NUMA node
for every persistent memory directory in the configuration. If both configurations
are done properly, the threads in the thread groups restricted to run on a single
NUMA node will use the NUMA-aware allocation strategy, while the rest of threads
will still use the round-robin strategy. To check which persistent memory is
attached to which NUMA node, the command
ndctl list -v -m fsdax can be used.
Please check which mount point represents which persistent memory device in the
Since both allocation strategies try to allocate from a single persistent memory directory, it may happen that the chosen directory cannot serve the allocation request due to lack of free capacity. In this case, both strategies take the other directories and try to serve the allocation from those. Please note that this compromises the NUMA-aware strategy in the way that there will be NUMA-remote persistent memory accesses.
While the persistent memory modules are mounted next to the regular memory modules and sharing the same memory bus, the two types of the modules have different performance characteristics. First, the persistent memory modules can be accessed with higher latency than the regular memory modules. Second, while with the regular memory modules the performance of the reads and the writes are not different, this is not the case with the persistent memory modules. The persistent memory has an asymmetric performance profile, which means the writes are slower than the reads.
Despite the above facts, whether the higher latency of the persistent memory impacts the performance of Hazelcast depends on multiple factors. Since Hazelcast is a distributed platform, the higher latency of the persistent memory modules can easily be hidden by the latency variance of the network and in the end, in certain use cases there may be no observable difference in the throughput of Hazelcast if it stores its data on persistent memory or on regular memory. Such a use case is caching, where accessing the entries remotely through Hazelcast clients results in a very similar throughput. Based on our tests with Intel® Optane™ DC persistent memory modules we recommend the Optane modules for the caching use case up to 10KB entry size.
Other use cases that don’t involve networking, such as iterating over all entries with entry processors can be impacted by the higher latency of the persistent memory modules, especially, if the entry processors update a significant portion of the entries. In general, in such a use case the higher the entry size, the higher the impact on the performance. That means with smaller entry sizes the performance of Hazelcast with persistent memory can be comparable to the performance with regular memory.