This is a prerelease version.

View latest

Historical Metrics

Management Center persists metrics so that you can view historical data for each cluster. You can customize how long data is persisted by changing the time-to-live (TTL).

Persisted metrics allow you to check the status of a cluster at a time in the past. Metrics include various time series data, including CPU load, memory consumption, and operation counters.

How Metrics are Persisted

When Management Center receives metrics from a Hazelcast cluster, it groups them by type into one-minute buckets before compressing them and saving them to disk in the hazelcast-mc/metrics directory. For example, metrics for CPU load goes into one bucket, while metrics for memory consumption go into another bucket.

You can customize where metrics data is persisted. See Configuring Management Center.

Every 10 seconds, Management Center starts a metrics-persistence thread that checks for one-minute buckets. To allow for late metrics due to poor connection or other errors, the metrics-persistence thread waits until one-minute buckets are 70 seconds old. This way, one-minute buckets have a 10-second threshold in which late metrics may be saved to disk.

For example:

  1. Management Center receives the first chunk of metrics at 9:00:00 AM.

  2. A one-minute bucket is created for each metric type in memory and each bucket’s timestamp is set to 9:00:00 AM. At this point nothing is saved to disk.

  3. The next chunk of metrics is received at 9:00:03 AM. No new one-minute buckets are created, only the new metrics are added to the existing one-minute buckets.

  4. The first metrics-persistence thread run starts at 9:00:10 AM. The thread didn’t find any minute buckets that are 70 seconds old. Still, nothing is saved to disk.

  5. At 9:01:10 AM, the metrics-persistence thread saves the compressed one-minute buckets to disk.

Each in-memory minute bucket consumes around 0.5 KB of memory.

Changing the Metrics Time-to-Live

If you want to customize the number of days for which Management Center persists metrics, you can start Management Center with the hazelcast.mc.metrics.disk.ttl.duration property.

This setting is a soft limit, which gives you indirect control over Management Center disk usage. The actual disk usage depends on the volume of metrics that are generated by your clusters such as the number of cluster members and the number of data structures with enabled statistics that you have in the connected clusters.

Persistence Logs

Every hour, you will see the following message in the logs:

Known time series
Tracked minute buckets
Number of persistence runs
Total persistence run time
Max persistence run time
Average persistence run time
Last hour average persistence run time
Total persisted minute buckets
Max persisted minute buckets per run
Average persisted minute buckets per run
Last hour average persisted minute buckets per run
Total dropped data points that arrived when the target minute bucket is no longer tracked, but not yet released(persisted)
Total evicted dangling minute series
Persistent store TTL
Persistent store size on disk
Data point memory compression ratio

These logs provide some basic persistence statistics and details about any active persistence configuration. All time values are logged in HH:mm:ss.SSS format.

Data point memory compression ratio shows how many times less disk space is used than in-memory storage.

Troubleshooting

Use this section to find suggestions for resolving errors that may occur with metrics persistence.

RocksDB File Permissions

Management Center uses the RocksDB native library to persist metrics. By default, the RocksDB binary is extracted into the java.io.tmpdir directory. To overcome file system permissions, you may need to override this default directory by setting the ROCKSDB_SHAREDLIB_DIR environment variable to the absolute path of another directory. See the following example:

mkdir $HOME/tmp/rocksdb
export ROCKSDB_SHAREDLIB_DIR="$(realpath $HOME/tmp/rocksdb)"
hz-mc start

Slow Background Persistence Runs

If Management Center tries to persist more metrics than the underlying persistent storage or disk throughput can handle, you will see the following warning:

Detected slow background persistence runs: 6 runs took 61000 ms. Consider decreasing count of collected metrics.

To resolve this issue, you can do the following: