Metrics Persistence

Your clusters collect and report metrics data for the connected Management Center. Metrics data includes a various number of time series, such as CPU load, memory consumption, and operation counters. See the Metrics section of the Hazelcast documentation for details about configuring metrics collection.

To save memory and improve performance, Management Center groups metrics into one-minute buckets before compressing them and saving them to disk. These persisted metrics are stored in the <User‘s Home Directory>/hazelcast-mc/metrics folder for one day.

If you want to store persisted metrics in a different folder, set the property on the server where Management Center is running.

If you want to store persisted metrics for a different amount of time, use the Time-to-Live setting.

To persist metrics, Management Center starts a metrics-persistence thread that checks for one-minute buckets every 10 seconds. To allow for late metrics due to poor connection or other errors, the metrics-persistence thread waits until one-minute buckets are 70 seconds old. This way, one-minute buckets have a 10-second threshold in which late data may be saved to disk.

For example:

  1. Management Center receives the first chunk of metrics at 9:00:00 AM.

  2. A one-minute bucket is created for each metric type (CPU load, memory consumption, etc) in memory and each bucket’s timestamp is set to 9:00:00 AM. At this point nothing is saved to disk.

  3. The next chunk of metrics is received at 9:00:03 AM. No new minute buckets are created, only the new metrics are added to the existing one-minute buckets.

  4. The first metrics-persistence thread run starts at 9:00:10 AM. The thread didn’t find any minute buckets that are 70 seconds old. Still, nothing is stored on disk.

  5. At 9:01:10 AM, the metrics-persistence thread stores the compressed one-minute buckets on disk.

Each in-memory minute bucket takes around 0.5 KB of memory.

Using Metrics Persistence

Metrics Persistence allows you to check the status of the cluster at a time in the past.

You can go back in time using the date picker on the left corners of the charts and check your cluster’s state at the selected time. All the data structures and members can be monitored as if you are using the Management Center normally (charts and data tables for each data structure and member).

You can press the Now button next to the date picker to see the latest data. Note that this will only show you the latest data on a chart and not cause the other charts and data tables to refresh.

Metrics Persistence diagnostics

You may see the following warning messages in the Management Center log:

Detected slow background persistence runs: 6 runs took 61000 ms. Consider decreasing count of collected metrics.

This message indicates that for the last minute Management Center tried to persist more metrics than underlying persistent storage or disk throughput.

If this message keeps showing up in the log, please consider decreasing the count of collected metrics on the cluster side. E.g., change the statistics-enabled configuration attributes of distributed data structures to false.

If this recommendation is ignored, Management Center will be using more JVM heap space, in order to keep both already accumulated minute buckets, and those that are accumulating at the moment, so you may have to increase the maximum heap size in JVM.

Each hour since Management Center start you will see the following info message in the log:

Known time series: 10959
Tracked minute buckets: 4678
Number of persistence runs: 11
Total persistence run time: 00:00:00.093
Max persistence run time: 00:00:00.065
Average persistence run time: 00:00:00.008
Last hour average persistence run time: 00:00:00.008
Total persisted minute buckets: 1319
Max persisted minute buckets per run: 1319
Average persisted minute buckets per run: 119
Last hour average persisted minute buckets per run: 119
Persistent store TTL: 1 day(s)
Persistent store size on disk: 13.9 MiB
Data point memory compression ratio: 32.25

It provides some basic persistence statistics and active configuration. All time values logged in HH:mm:ss.SSS format. Data point memory compression ratio shows how many times less disk space is used than in-memory.

Configuring RocksDB temporary directory

Management Center uses the RocksDB native library under the hood for metrics persistence. By default, the RocksDB binary is extracted into the directory. In case you run into any file system permission errors, you can override this default by setting the ROCKSDB_SHAREDLIB_DIR environment variable to the absolute path of another directory. See the following example:

mkdir $HOME/tmp/rocksdb
export ROCKSDB_SHAREDLIB_DIR="$(realpath $HOME/tmp/rocksdb)"
java -jar hazelcast-management-center-5.0.1.jar