This is a prerelease version.

View latest

List of Hazelcast Metrics

The table below lists the metrics with their explanations in grouped by their relevant subjects.

The metrics are collected per member and specific to the local member from which you collect them. For example, the distributed data structure metrics reflect the local statistics of that data structure for the portion held in that member.

Some metrics may store cluster-wide agreed value, that is, they may show the values obtained by communicating with other members in the cluster. This type of metrics reflect the member’s local view of the cluster (consider split-brain scenarios). The clusterStartTime is an example of this type of metrics, and its value in the local member is obtained by communicating with the master member.

Streaming Engine Cluster-Wide Metrics
Name Description Tags

blockingWorkerCount

Number of non-cooperative workers employed.

none

Each Hazelcast member will have one instance of this metric.

jobs.submitted

Number of computational jobs submitted.

jobs.completedSuccessfully

Number of computational jobs successfully completed.

jobs.completedWithFailure

Number of computational jobs that have failed.

jobs.executionStarted

Number of computational job executions started. Each job can execute multiple times, for example when it’s restarted or suspended and then resumed.

jobs.executionTerminated

Number of computational job executions finished. Each job can execute multiple times, for example when it’s restarted or suspended and then resumed.

iterationCount

The total number of iterations the driver of tasklets in cooperative thread N made. It should increase by at least 250 iterations/s. Lower value means some of the cooperative processors blocks for too long. Somewhat lower value is normal if there are many tasklets assigned to the processor. Lower value affects the latency.

cooperativeWorker

Each Hazelcast member will have one of this metric for each of its cooperative worker threads.

taskletCount

The number of assigned tasklets to cooperative thread N.

Streaming Engine Job-specific Metrics

All job specific metrics have their job (name and ID of the job) and exec(job execution ID) tags set, and most also have the vertex (vertex name) tag set (with very few exceptions). This means that most of these metrics will have at least one instance for each vertex of each current job execution.

Additionally, if the vertex sourcing them is a data source or data sink, then the source or sink tags will also be set to true.

Name Description Tags

executionStartTime

Start time of the current execution of the job (epoch time in milliseconds).

job, exec

There will be a single instance of these metrics for each job execution.

executionCompletionTime

Completion time of the current execution of the job (epoch time in milliseconds).

status

Information regarding the job’s status. Potential values: 0 - submitted; 1 - initialization phase; 2 - running; 3 - suspended; 4 - exporting snapshot; 5 - completing phase; 6 - failed; 7 - completed successfully;

userCancelled

Details on whether the user canceled the job. Potential values: 1 - canceled by the user, 0 - otherwise. Exposed via Job API only.

snapshotBytes

Total number of bytes written out in the last snapshot.

job, exec, vertex

There will be a single instance of these metrics for each vertex.

snapshotKeys

Total number of keys written out in the last snapshot.

distributedBytesIn

Total number of bytes received from remote members.

job, exec, vertex, ordinal

Each Hazelcast member will have an instance of these metrics for each ordinal of each vertex of each job execution.

Note: These metrics are only present for distributed edges, i.e., edges producing network traffic.

distributedBytesOut

Total number of bytes sent to remote members.

distributedItemsIn

Total number of items received from remote members.

distributedItemsOut

Total number of items sent to remote members.

topObservedWm

This value is equal to the highest coalescedWm on any input edge of this processor.

job, exec, vertex, proc

Each Hazelcast member will have one instances of these metrics for each processor instance N, the N denotes the global processor index. Processor is the parallel worker doing the work of the vertex.

coalescedWm

The highest watermark received from all inputs that was sent to the processor to handle.

lastForwardedWm

Last watermark emitted by the processor to output.

lastForwardedWmLatency

The difference between <i>lastForwardedWn</i> and the system time at the moment when metrics were collected.

queuesCapacity

The total capacity of input queues.

queuesSize

The total number of items waiting in input queues.

topObservedWm

The highest received watermark from any input on edge N.

job, exec, vertex, proc, ordinal

Each Hazelcast member will have one instance of these metrics for each edge M (input or output) of each processor N. N is the global processor index and M is either the ordinal of the edge or has the value snapshot for output items written to state snapshot.

coalescedWm

The highest watermark received from all upstream processors on edge N.

emittedCount

The number of emitted items. This number includes watermarks, snapshot barriers etc. Unlike distributedItemsOut, it includes items emitted items to local processors.

receivedCount

The number of received items. This number does not include watermarks, snapshot barriers etc. It’s the number of items the Processor.process method will receive.

receivedBatches

The number of received batches. Processor.process receives a batch of items at a time, this is the number of such batches. By dividing receivedCount by receivedBatches, you get the average batch size. It will be 1 under low load.

numInFlightOps

The number of pending (in flight) operations when using asynchronous mapping processors. See Processors.mapUsingServiceAsyncP.

job, exec, vertex, proc, procType

Processor specific metrics, only certain types of processors have them. The procType tag can be used to identify the exact type of processor sourcing them. Like all processor metrics, each Hazelcast member will have one instances of these metrics for each processor instance N, the N denotes the global processor index.

totalKeys

The number of active keys being tracked by a session window processor.

totalWindows

The number of active windows being tracked by a session window processor. See Processors.aggregateToSessionWindowP.

totalFrames

The number of active frames being tracked by a sliding window processor.

totalKeysInFrames

The number of grouping keys associated with the current active frames of a sliding window processor. See Processors.aggregateToSlidingWindowP.

lateEventsDropped

The number of late events dropped by various processor, due to the watermark already having passed their windows.

Map

Name

Unit

Description

map.backupCount

count

Number of backups per entry

map.backupEntryCount

count

Number of backup entries held by the member

map.backupEntryMemoryCost

bytes

Memory cost of backup entries in this member

map.creationTime

ms

Creation time of the map on the member

map.dirtyEntryCount

count

Number of updated but not persisted yet entries, dirty entries, that the member owns

map.evictionCount

count

Number of evictions happened on locally owned entries, backups are not included

map.expirationCount

count

Number of expirations happened on locally owned entries, backups are not included

map.getCount

count

Number of local get operations on the map; it is incremented for every get operation even the entries do not exist.

map.heapCost

count

Total heap cost for the map on this member

map.hits

count

Number of reads of the locally owned entries; it is incremented for every read by any type of operation (get, set, put). So, the entries should exists.

map.indexedQueryCount

count

Total number of indexed local queries performed on the map

map.lastAccessTime

ms

Last access (read) time of the locally owned entries

map.lastUpdateTime

ms

Last update time of the locally owned entries

map.lockedEntryCount

count

Number of locked entries that the member owns

map.merkleTreesCost

count

Total heap cost of the Merkle trees used

map.numberOfEvents

count

Number of local events received on the map

map.numberOfOtherOperations

count

Total number of other operations performed on this member

map.ownedEntryCount

count

Number of map entries owned by the member

map.ownedEntryMemoryCost

bytes

Memory cost of owned map entries on this member

map.putCount

count

Number of local put operations on the map

map.queryCount

count

Number of queries executed on the map (it may be imprecise for queries involving partition predicates (PartitionPredicate) on the off-heap storage)

map.removeCount

count

Number of local remove operations on the map

map.setCount

count

Number of local set operations on the map

map.totalGetLatency

ms

Total latency of local get operations on the map

map.totalMaxGetLatency

ms

Maximum latency of local get operations on the map

map.totalMaxPutLatency

ms

Maximum latency of local put operations on the map

map.totalMaxRemoveLatency

ms

Maximum latency of local remove operations on the map

map.totalMaxSetLatency

ms

Maximum latency of local set operations on the map

map.totalPutLatency

ms

Total latency of local put operations on the map

map.totalRemoveLatency

ms

Total latency of local remove operations on the map

map.totalSetLatency

ms

Total latency of local set operations on the map

The above *latency metrics are only measured for the members and they are not representing the overall performance of the cluster. We recommend monitoring the average latency for each operation, for example, map.totalGetLatency / map.getCount and map.totalSetLatency / map.setCount. Increased average latency is a sign that the cluster would experience performance problems, or there is a spike in the load. The following may be the reasons:

  • Increase in the load on the cluster: If the cluster is under heavy load, this can lead to increased latency for all operations, slowing down the overall performance.

  • Increasing member count in the cluster: As the number of cluster members increases, the total latency for operations can also increase. This is because the cluster has to communicate with more members, which can add to the overall latency. This might be a data architecture problem.

  • Increasing the data set size: This causes the cluster to search through more data to find the requested data, which can slow down the overall performance. Creating indexes may solve these kind of problems.

  • Increasing the number of concurrent operations: This causes the cluster to process more requests at the same time, which can slow down the overall performance. This is a potential bottleneck on resources (CPU, memory, network).

map.index.averageHitLatency

ns

Average hit latency for the index on this member

map.index.averageHitSelectivity

percent

Average selectivity of the hits served by the index on this member (the returned value is in the range from 0.0 to 1.0 - values close to 1.0 indicate a high selectivity meaning the index is efficient; values close to 0.0 indicate a low selectivity meaning the index efficiency is approaching an efficiency of a simple full scan)

map.index.creationTime

ms

Creation time of the index on this member

map.index.hitCount

count

Total number of index hits (the value of this metric may be greater than the map.index.queryCount since a single query may hit the same index more than once)

map.index.insertCount

count

Number of insert operations performed on the index

map.index.memoryCost

bytes

Local memory cost of the index (for on-heap indexes in OBJECT or BINARY formats, the returned value is just a best-effort approximation and doesn’t indicate a precise on-heap memory usage of the index)

map.index.queryCount

count

Total number of queries served by the index

map.index.removeCount

count

Number of remove operations performed on the index

map.index.totalInsertLatency

ns

Total latency of insert operations performed on the index

map.index.totalRemoveLatency

ns

Total latency of remove operations performed on the index

map.index.totalUpdateLatency

ns

Total latency of update operations performed on the index.

map.index.updateCount

count

Number of update operations performed on the index

MultiMap

Name

Unit

Description

multiMap.backupCount

count

Number of backups per entry

multiMap.backupEntryCount

count

Number of backup entries held by the member

multiMap.backupEntryMemoryCost

bytes

Memory cost of backup entries in this member

multiMap.creationTime

ms

Creation time of the multimap in the member

multiMap.dirtyEntryCount

count

Number of dirty (updated but not persisted yet) entries that the member owns

multiMap.getCount

count

Number of local get operations on the multimap

multiMap.heapCost

count

Total heap cost for the multimap on this member

multiMap.hits

count

Number of hits (reads) of the locally owned entries

multiMap.indexedQueryCount

count

Total number of indexed local queries performed on the multimap

multiMap.lastAccessTime

ms

Last access (read) time of the locally owned entries

multiMap.lastUpdateTime

ms

Last update time of the locally owned entries

multiMap.lockedEntryCount

count

Number of locked entries that the member owns

multiMap.merkleTreesCost

count

Heap cost of the Merkle trees

multiMap.numberOfEvents

count

Number of local events received

multiMap.numberOfOtherOperations

count

Total number of other operations

multiMap.ownedEntryCount

count

Number of multimap entries owned by the member

multiMap.ownedEntryMemoryCost

bytes

Memory cost of owned multimap entries on this member

multiMap.putCount

count

Number of local put operations on the multimap

multiMap.queryCount

count

Number of local queries executed on the multimap (it may be imprecise for queries involving partition predicates (PartitionPredicate) on the off-heap storage)

multiMap.removeCount

count

Number of local remove operations on the multimap

multiMap.setCount

count

Number of local set operations on the multimap

multiMap.totalGetLatency

ms

Total latency of local get operations

multiMap.totalMaxGetLatency

ms

Maximum latency of local get operations

multiMap.totalMaxPutLatency

ms

Maximum latency of local put operations

multiMap.totalMaxRemoveLatency

ms

Maximum latency of local remove operations

multiMap.totalMaxSetLatency

ms

Maximum latency of local set operations

multiMap.totalPutLatency

ms

Total latency of local put operations

multiMap.totalRemoveLatency

ms

Total latency of local remove operations

multiMap.totalSetLatency

ms

Total latency of local set operations

Replicated Map

Name

Unit

Description

replicatedMap.creationTime

ms

Creation time of this replicated map on this member

replicatedMap.getCount

count

Number of get operations on this member

replicatedMap.hits

count

Number of hits (reads) of the locally owned entries

replicatedMap.lastAccessTime

ms

Last access (read) time of the locally owned entries

replicatedMap.lastUpdateTime

ms

Last update time of the locally owned entries

replicatedMap.maxGetLatency

ms

Maximum latency of get operations

replicatedMap.maxPutLatency

ms

Maximum latency of put operations

replicatedMap.maxRemoveLatency

ms

Maximum latency of remove operations

replicatedMap.numberOfEvents

count

Number of events received on this member

replicatedMap.numberOfOtherOperations

count

Total number of other operations on this member

replicatedMap.ownedEntryCount

count

Number of entries owned on this member

replicatedMap.ownedEntryMemoryCost

bytes

Memory cost of owned entries on this member

replicatedMap.putCount

count

Number of put operations on this member

replicatedMap.removeCount

count

Number of remove operations on this member

replicatedMap.totalGetLatencies

ms

Total latency of get operations

replicatedMap.totalPutLatencies

ms

Total latency of put operations

replicatedMap.totalRemoveLatencies

ms

Total latency of remove operations

replicatedMap.total

count

Total number of operations on this member

Cache

Name

Unit

Description

cache.averageGetTime

µs

Mean time to execute gets on the cache

cache.averagePutTime

µs

Mean time to execute puts on the cache

cache.averageRemovalTime

µs

Mean time to execute removes on the cache

cache.cacheEvictions

count

Number of evictions on the cache

cache.cacheGets

count

Number of gets on the cache

cache.cacheHits

count

Number of successful get operations, hits, on the cache

cache.cacheHitPercentage

percent

Percentage of successful get operations, hits, out of all get operations on the cache

cache.cachePuts

count

Number of puts to the cache

cache.cacheRemovals

count

Number of removals from the cache

cache.cacheMisses

count

Number of missed cache accesses on the cache

cache.cacheMissPercentage

percent

Percentage of missed cache accesses out of all the cache accesses/access attempts

cache.creationTime

ms

Creation time of the cache on the member

cache.lastAccessTime

ms

Cache’s last access time

cache.lastUpdateTime

ms

Cache’s last update time

cache.ownedEntryCount

count

Locally owned entry count in the cache

Queue

Name

Unit

Description

queue.averageAge

ms

Average age of the items in this member

queue.backupItemCount

count

Number of backup items held by the member

queue.creationTime

ms

Creation time of the topic on the member

queue.eventOperationCount

count

Number of event operations

queue.maxAge

ms

Maximum age of the items in this member

queue.minAge

ms

Minimum age of the items in this member

queue.numberOfEmptyPolls

count

Number of null returning poll operations

queue.numberOfEvents

count

Number of event operations (duplicate of eventOperationCount)

queue.numberOfOffers

count

Number of offer/put/add operations

queue.numberOfOtherOperations

count

Number of other operations

queue.numberOfPolls

count

Number of poll/take/remove operations.

queue.numberOfRejectedOffers

count

Number of rejected offers

queue.ownedItemCount

count

Number of owned items in this member

queue.total

count

Total number of operations (numberOfOffers + numberOfPolls + numberOfOtherOperations)

Set

Name

Unit

Description

set.creationTime

ms

Creation time of the set on the member

set.lastAccessTime

ms

Last access (read) time of the locally owned items

set.lastUpdateTime

ms

Last update time of the locally owned items

List

Name

Unit

Description

list.creationTime

ms

Creation time of this list on the member

list.lastAccessTime

ms

Last access (read) time of the locally owned items

list.lastUpdateTime

ms

Last update time of the locally owned items

Topic

Name

Unit

Description

topic.creationTime

ms

Creation time of the topic on the member

topic.totalPublishes

count

Total number of published messages of this topic on this member

topic.totalReceivedMessages

count

Total number of received messages of this topic on this member

Reliable Topic

Name

Unit

Description

reliableTopic.creationTime

ms

Creation time of this reliable topic on the member

reliableTopic.totalPublishes

count

Total number of published messages of this reliable topic on this member

reliableTopic.totalReceivedMessages

count

Total number of received messages of this reliable topic on this member

Flake ID Generator

Name

Unit

Description

flakeIdGenerator.batchCount

count

Total number of times the Flake ID generator has been used to generate a new ID batch

flakeIdGenerator.creationTime

ms

Creation time of this Flake ID Generator on the member

flakeIdGenerator.idCount

count

Total number of IDs generated (the sum of IDs for all batches)

PN Counter

Name

Unit

Description

pnCounter.creationTime

ms

Creation time of the PN counter on the member

pnCounter.totalDecrementOperationCount

count

Number of subtract (including decrement) operations on this PN counter

pnCounter.totalIncrementOperationCount

count

Number of add (including increment) operations on this PN counter

pnCounter.value

count

Current value of the PN counter

Executor Service

Name

Unit

Description

executor.cancelled

count

Number of cancelled operations on the executor service

executor.completed

count

Number of completed operations on the executor service

executor.creationTime

ms

Creation time of this executor on the member

executor.pending

count

Number of pending operations on the executor service

executor.started

count

Number of started operations on the executor service

executor.totalExecutionTime

ms

Total execution time of the finished operations

executor.totalStartLatency

ms

Total start latency of operations started

executor.internal.completedTasks

count

Number of completed tasks by this executor

executor.internal.maximumPoolSize

count

Maximum number of threads in the executor’s thread pool

executor.internal.poolSize

count

Number of threads in the executor’s thread pool

executor.internal.queueSize

count

Number of pending tasks in this executor’s task queue

executor.internal.remainingQueueCapacity

count

Remaining capacity on the executor’s task queue

User Code Deployment

Name

Unit

Description

classloading.loadedClassesCount

count

Number of classes that are currently loaded

classloading.totalLoadedClassesCount

count

Total number of classes that have been loaded since the instance has started execution.

classloading.unloadedClassesCount

count

Total number of unloaded classes.

Cluster

Name

Unit

Description

cluster.clock.clusterStartTime

ms

Start time of the cluster (when the first member in cluster becomes master, its localClockTime value is saved as clusterStartTime)

cluster.clock.clusterTime

ms

Elapsed time since the master member was created (cluster.clock.clusterStartTime)

cluster.clock.clusterTimeDiff

ms

Difference between the local time (cluster.clock.localClockTime) of the local member and the master member

cluster.clock.clusterUpTime

ms

Uptime of the cluster (current time - cluster.clock.clusterStartTime)

cluster.clock.localClockTime

ms

Member’s local clock timestamp

cluster.clock.maxClusterTimeDiff

ms

Maximum observed cluster time difference

cluster.connection.closedTime

count

Connection close time for this connection

cluster.connection.connectionId

count

Connection ID for this client connection

cluster.connection.eventHandlerCount

count

Number of event handlers for this client connection

cluster.heartbeat.lastHeartbeat

ms

Last time that this member sent a heartbeat to other known cluster members

cluster.size

count

Number of members in the cluster

Clients

Name

Unit

Description

client.endpoint.count

count

Number of active client endpoints for this member

client.endpoint.totalRegistrations

count

Total number of client endpoint registrations

Client Invocations

Name

Unit

Description

invocations.maxCurrentInvocations

count

Maximum number of concurrent client invocations

invocations.pendingCalls

count

Number of pending client invocations on this client

invocations.startedInvocations

count

Number of started client invocations on this client

CP Subsystem

Name

Unit

Description

cp.atomiclong.value

count

Value of this IAtomicLong

cp.countdownlatch.count

count

Initial count of ICountDownLatch

cp.countdownlatch.remaining

count

Remaining number of expected countdowns

cp.countdownlatch.round

count

Round number of the ICountDownLatch; each time ICountDownLatch is initialized with a new count after it downs to zero, a new round begins

cp.lock.acquireLimit

count

Maximum number of reentrant acquires of this FencedLock

cp.lock.lockCount

count

Total number of times this FencedLock has been acquired since its creation

cp.lock.owner

count

Address of the FencedLock owner

cp.lock.ownerSessionId

count

Session Id of the FencedLock owner

cp.semaphore.available

count

Number of the remaining available permits

cp.semaphore.initialized

count

State value which shows whether semaphore is initialized with a value or not; in exposed metrics, it shows 0 when semaphore is not initialized, a positive value otherwise

cp.map.size

count

Number of keys in this CPMap

cp.map.sizeBytes

count

Total number of bytes used by key-value pairs for this CPMap

cp.session.creationTime

ms

Creation time of this session

cp.session.endpoint

Address of the endpoint which the CP session belongs to

cp.session.endpointType

Type of the endpoint; either SERVER or CLIENT

cp.session.expirationTime

ms

Expiration time of the CP session

cp.session.version

count

Version number of the CP session, basically it shows how many times the session heartbeat is received

Events

Name

Unit

Description

event.eventQueueSize

count

Total number of events pending to be processed

event.eventsProcessed

count

Total number of processed events

event.listenerCount

count

Number of subscribed listeners for the specified service

event.publicationCount

count

Number of published events for the specified service

event.queueCapacity

count

Queue capacity of the executor processing the events (this capacity is shared for all events)

event.rejectedCount

count

Number of rejected events; if the event is not accepted to the executor in hazelcast.event.queue.timeout.millis(see System Properties), it will be rejected and not processed

event.syncDeliveryFailureCount

count

Number of failures of sync event delivery

event.threadCount

count

Number of threads for the event service executor (the event thread count)

event.totalFailureCount

count

Number of events that fail to be published

Listeners

Name

Unit

Description

listeners.eventsProcessed

count

Total number of processed events on the client listener service

listeners.eventQueueSize

count

Total number of tasks pending to be processed on the client listener service

Capacity

Name

Unit

Description

file.partition.freeSpace

bytes

Amount of free space in the given directory, user.home

file.partition.totalSpace

bytes

Amount of total space in the given directory, user.home

file.partition.usableSpace

bytes

Amount of usable space in the given directory, user.home

Garbage Collection

Name

Unit

Description

gc.majorCount

count

Total number of major garbage collections (GCs) that have occurred

gc.majorTime

ms

Accumulated elapsed time in major GCs

gc.minorCount

count

Total number of minor GCs that have occurred

gc.minorTime

ms

Accumulated elapsed time in minor GCs

gc.unknownCount

count

Number of unknown GCs that cannot be determined as minor or major (this is usually due to the lack of support of the used garbage collector)

gc.unknownTime

ms

Accumulated elapsed time in unknown GCs

Memory

Name

Unit

Description

memory.committedHeap

bytes

Amount of heap memory that is committed for the JVM to use

memory.committedNative

bytes

Amount of native memory that is committed for current instance (member or client) to use

memory.freeHeap

bytes

Amount of free memory in the JVM of current instance (member or client)

memory.freeNative

bytes

Amount of free native memory in the current instance (member or client)

memory.freePhysical

bytes

Amount of free physical memory available in OS

memory.maxHeap

bytes

Maximum amount of memory that the JVM will attempt to us

memory.maxMetadata

bytes

Amount of native memory reserved for metadata (this memory is separate and not accounted for by the NativeMemory statistics)

memory.maxNative

bytes

Maximum amount of native memory that current instance (member or client) will attempt to use

memory.totalPhysical

bytes

Amount of total physical memory available in OS

memory.usedHeap

bytes

Amount of used memory in the JVM of the current instance (member or client)

memory.usedMetadata

bytes

Amount of used metadata memory by the current instance (member or client)

memory.usedNative

bytes

Amount of used native memory by the current instance (member or client)

Near Cache

Name

Unit

Description

nearcache.creationTime

ms

Creation time of this Near Cache on this instance (member or client)

nearcache.evictions

count

Number of evictions of Near Cache entries owned by this instance (member or client)

nearcache.expirations

count

Number of TTL and max-idle expirations of Near Cache entries owned by this instance (member or client)

nearcache.hits

count

Number of hits (reads) of Near Cache entries owned by this instance (member or client)

nearcache.invalidationRequests

count

Number of invalidations of Near Cache entries owned by this instance (member or client).

nearcache.invalidations

count

Number of invalidations of Near Cache entries owned by this instance (member or client).

nearcache.lastPersistenceDuration

ms

Duration of the last Near Cache key persistence

nearcache.lastPersistenceKeyCount

count

Number of persisted keys of the last Near Cache key persistence

nearcache.lastPersistenceTime

ms

Timestamp of the last Near Cache key persistence

nearcache.lastPersistenceWrittenBytes

bytes

Written bytes of the last Near Cache key persistence

nearcache.misses

count

Number of misses of Near Cache entries owned by this instance (member or client).

nearcache.ownedEntryCount

count

Number of Near Cache entries owned by this instance (member or client)

nearcache.ownedEntryMemoryCost

bytes

Memory cost of Near Cache entries owned by this instance (member or client)

nearcache.persistenceCount

count

Number of Near Cache key persistences (when the preload feature is enabled)

Operations
Within Hazelcast context, the priority operations are the ones that are important for the stability of cluster, for example heartbeats and migration requests. The normal operations are the ones that manipulate the data, for example map.get and map.put.

Name

Unit

Description

operation.adhoc.executedOperationsCount

count

Number of executed adhoc operations

operation.asyncOperations

count

Number of current executing async operations on the operation service of the member

operation.completedCount

count

Number of completed operations

operation.failedBackups

count

Number of failed backup operations on the operation service of the member

operation.generic.executedOperationsCount

count

Number of executed generic operations

operation.genericPriorityQueueSize

count

Number of priority generic operations pending (waiting in the priority queue)

operation.genericQueueSize

count

Number of normal generic operations pending (waiting in the queue)

operation.genericThreadCount

count

Number of generic operation handler threads in the member

operation.invocations.backupTimeoutMillis

ms

Operation backup timeout that specifies how long the invocation will wait for acknowledgements from the backup replicas (if acks are not received from some backups, there will not be any rollback on other successful replicas)

operation.invocations.backupTimeouts

count

Number of operation invocations that acknowledgment from backups has timeout

operation.invocations.delayedExecutionCount

count

Number of times that the operation invocations have delayed

operation.invocations.heartbeatBroadcastPeriodMillis

ms

Broadcast period of operation heartbeats (this heartbeat packets sent to inform the other member about if the operation is still alive). The heartbeat period is configured to be 1/4 of the call timeout. So with default settings, every 15 seconds, every member in the cluster, will notify every other member in the cluster about all calls that are pending.

operation.invocations.heartbeatPacketsReceived

count

Number of received heartbeat packets

operation.invocations.heartbeatPacketsSent

count

Number of sent heartbeat packets

operation.invocations.invocationScanPeriodMillis

ms

Period for scanning over pending invocations for getting rid of duplicates, checking for heartbeat timeout, and checking for backup timeout

operation.invocations.invocationTimeoutMillis

ms

Timeout for operation invocations

operation.invocations.lastCallId

count

Last issued invocation call ID

operation.invocations.normalTimeouts

count

Number of times that the operation invocations have timed out

operation.invocations.pending

count

Number of pending invocations

operation.invocations.usedPercentage

percent

Usage percentage of the operation invocation capacity (pending invocations/ max concurrent invocations)

operation.parker.parkQueueCount

count

Number of separate WaitSet (set of operations waiting for some condition)

operation.parker.totalParkedOperationCount

count

Total number of parked operations

operation.partition.executedOperationsCount

count

Number of executed partition operations on the specified partition

operation.partitionThreadCount

count

Number of partition operation handler threads for given member

operation.priorityQueueSize

count

Number of priority operations pending (priority partition ops. + priority generic ops.)

operation.queueSize

count

Number of normal operations pending (normal partition operations + normal generic operations).

It refers to the number of operations sent to the member that have yet to be consumed for processing by the partition operation threads. This is the most critical queue for partition aware operations such as map.put and map.remove. This value should be zero or very close to zero. Based on your latency tolerance in your business use case, you can define a threshold for alerts with your preferred alerting mechanism. For instance, triggering an alert if this value is above 100 for 15 seconds would be useful.

operation.responseQueueSize

count

Total number of pending responses (work queue for the response threads) to be processed.

operation.responses.backupCount

count

Number of backup acknowledgement responses

operation.responses.errorCount

count

Number of error responses

operation.responses.missingCount

count

Number of responses having missing invocations

operation.responses.normalCount

count

Number of normal responses

operation.responses.timeoutCount

count

Number of call timeout responses

operation.retryCount

count

Number of retried operations

operation.runningCount

count

Number of currently running operations (runningPartitionCount + runningGenericCount)

operation.runningGenericCount

count

Number of currently running generic (non partition specific) operations

operation.runningPartitionCount

count

Number of currently running partition operations

operation.thread.completedOperationCount

count

Number of completed operations by this operation thread

operation.thread.completedOperationBatchCount

count

Number of completed TaskBatch (a batch of tasks) by this operation thread

operation.thread.completedPacketCount

count

Number of packets that executed by this operation thread

operation.thread.completedPartitionSpecificRunnableCount

count

Number of PartitionSpecificRunnable tasks executed by this operation thread

operation.thread.completedRunnableCount

count

Total number of runnables executed by this operation thread

operation.thread.completedTotalCount

count

Total number of tasks (Operation + PartitionSpecificRunnable + Runnable + TaskBatch) completed on this operation thread

operation.thread.errorCount

count

Total number of failed tasks on this operation thread

operation.thread.normalPendingCount

count

Number of normal pending operations (tasks)

operation.thread.priorityPendingCount

count

Number of priority pending operations (tasks)

Operating System

Name

Unit

Description

os.committedVirtualMemorySize

bytes

Amount of committed virtual memory (that is, the amount of virtual memory guaranteed to be available to the running process)

os.freePhysicalMemorySize

bytes

Amount of free physical memory

os.freeSwapSpaceSize

bytes

Amount of free swap space size

os.maxFileDescriptorCount

count

Maximum number of open file descriptors (only for UNIX platforms)

os.openFileDescriptorCount

count

Number of open file descriptors (only for UNIX platforms)

os.processCpuLoad

percent

Recent CPU usage for the JVM process; a negative value if not available

os.processCpuTime

ms

CPU time used by the process on which the JVM is running

os.systemCpuLoad

percent

Recent CPU usage for the whole system; a negative value if not available

os.systemLoadAverage

percent

System load average for the last minute, or a negative value if not available

os.totalPhysicalMemorySize

bytes

Total amount of physical memory

os.totalSwapSpaceSize

bytes

Total amount of swap space

Partitions

Name

Unit

Description

partitions.activePartitionCount

count

Number of partitions assigned to the member

partitions.completedMigrations

count

Number of completed migrations on the latest repartitioning round

partitions.elapsedDestinationCommitTime

ns

Total elapsed time of commit operations' executions to the destination endpoint on the latest repartitioning round

partitions.elapsedMigrationOperationTime

ns

Total elapsed time of migration & replication operations' executions from source to destination endpoints on the latest repartitioning round

partitions.elapsedMigrationTime

ns

Total elapsed time from the start of migration tasks to their completion (successful or otherwise) on the latest repartitioning round

partitions.lastRepartitionTime

ms

Latest time that repartition took place

partitions.localPartitionCount

count

Number of partitions currently owned by given member

partitions.maxBackupCount

count

Maximum allowed backup count according to current cluster formation and partition group configuration

partitions.memberGroupsSize

count

Number of the member groups to be used in partition assignments

partitions.migrationActive

boolean

Whether there are any currently active migration tasks

partitions.migrationQueueSize

count

Number of migration tasks in the migration queue

partitions.partitionCount

count

Total partition count

partitions.plannedMigrations

count

Number of planned migrations on the latest repartitioning round

partitions.replicaSyncRequestsCounter

count

Number of replica sync requests

partitions.replicaSyncSemaphore

count

Permits count of this replica sync semaphore

partitions.stateStamp

count

Stamp value for the current partition table; stamp is calculated by hashing the individual partition versions using MurmurHash3 (if stamp has this initial value, 0L, then that means partition table is not initialized yet)

partitions.totalCompletedMigrations

count

Total number of completed migrations

partitions.totalElapsedDestinationCommitTime

ns

Total elapsed time of commit operations' executions to the destination endpoint since the beginning

partitions.totalElapsedMigrationOperationTime

ns

Total elapsed time of migration & replication operations' executions from source to destination endpoints since the beginning

partitions.totalElapsedMigrationTime

ns

Total elapsed time from the start of migration tasks to their completion (successful or otherwise) since the beginning

Persistence

Name

Unit

Description

persistence.liveTombstones

count

Number of live tombstones in the persistent store

persistence.liveValues

count

Number of live values in the persistent store

persistence.tombGarbage

bytes

Approximate size of the garbage within the tombstone chunks (it does not account for data in the active chunk, it is incremented when a record is retired or an active chunk is turned into a stable one)

persistence.tombOccupancy

bytes

Approximate size of tombstone chunks (it does not account for data in the active chunk, it is incremented when the active chunk is turned into a stable one)

persistence.valGarbage

bytes

Approximate size of the garbage within the value chunk (it does not account for data in the active chunk, it is incremented when a record is retired or an active chunk is turned into a stable one)

persistence.valOccupancy

bytes

Approximate size of value chunks (it does not account for data in the active chunk, it is incremented when the active chunk is turned into a stable one)

Proxies

Name

Unit

Description

proxy.createdCount

count

Number of created proxies for a given service

proxy.destroyedCount

count

Number of destroyed proxies for a given service

proxy.proxyCount

count

Number of active proxies for a given service

Raft Algorithm

Name

Unit

Description

raft.destroyedGroupIds

count

Number of destroyed raft node group IDs

raft.group.availableLogCapacity

count

Available log capacity for this CP group

raft.group.commitIndex

count

Commit index of this CP group

raft.group.lastApplied

count

Last applied index of this CP group

raft.group.lastLogIndex

count

Last log index of this CP group

raft.group.lastLogTerm

count

Last log term of this CP group

raft.group.memberCount

count

Number of members in this CP group

raft.group.snapshotIndex

count

Raft snapshot index of this CP group

raft.group.term

count

Raft term of this CP group

raft.metadata.activeMembersCommitIndex

count

Commit index of the active CP members

raft.metadata.activeMembers

count

Number of active CP members

raft.metadata.groups

count

Number of CP groups

raft.missingMembers

count

Number of missing CP members

raft.nodes

count

Number of local Raft nodes

raft.terminatedRaftNodeGroupIds

count

Number of terminated raft node group IDs

Runtime

Name

Unit

Description

runtime.availableProcessors

count

Number of processors available to the JVM

runtime.freeMemory

bytes

Amount of free memory in the JVM

runtime.maxMemory

bytes

Maximum amount of memory that the JVM will attempt to use

runtime.totalMemory

bytes

Total amount of memory in the JVM, the value of this metric may vary over time, depending on the host environment

runtime.upTime

ms

Uptime of the JVM

runtime.usedMemory

bytes

Approximation to the total amount of memory currently used

TCP

Name

Unit

Description

tcp.acceptor.eventCount

count

Total number of the connections accepted by TcpServerAcceptor

tcp.acceptor.exceptionCount

count

Number of thrown exceptions on this TcpServerAcceptor

tcp.acceptor.idleTimeMillis

ms

Idle time that measures how long this TcpServerAcceptor has not received any events

tcp.acceptor.selectorRecreateCount

count

Number of times the selector was recreated

tcp.balancer.imbalanceDetectedCount

count

Number of times the IOBalancer detects the imbalance of loads on NioThread s

tcp.balancer.migrationCompletedCount

count

Number of completed NioPipeline migrations by the IOBalancer (these migrations are performed to fix the load imbalance problem on the NioThreads)

tcp.bytesReceived

bytes

Number of bytes received over all connections (active and closed)

tcp.bytesSend

bytes

Number of bytes sent over all connections (active and closed)

tcp.connection.acceptedSocketCount

count

Number of accepted socket channels

tcp.connection.activeCount

count

Number of active connections

tcp.connection.clientCount

count

Number of the active client connections

tcp.connection.closedCount

count

Number of closed connections

tcp.connection.connectionListenerCount

count

Number of active connection listeners

tcp.connection.count

count

Number of TcpServerConnection

tcp.connection.inProgressCount

count

Number of connection establishments in progress

tcp.connection.openedCount

count

Number of opened connections

tcp.connection.textCount

count

Number of connections used by text-based protocols (REST, Memcache)

tcp.connection.in/out.completedMigrations

count

Number of completed migrations on this pipeline (migrates this pipeline to a different NioThread)

tcp.connection.in/out.opsInterested

count

tcp.connection.in/out.opsReady

count

tcp.connection.in/out.ownerId

count

Owner ID of this NioPipeline, -1 if the pipeline is being migrated (owner is null)

tcp.connection.in/out.processCount

count

Number of time the NioPipeline.process() method has been called

tcp.connection.in/out.startedMigrations

count

Number of started migrations on this pipeline

tcp.connection.in.bytesRead

bytes

Total size of frames read on this inbound pipeline

tcp.connection.in.idleTimeMs

ms

Idle time that indicates how long since the last read on this inbound nio pipeline

tcp.connection.in.normalFramesRead

count

Number of priority frames read on this inbound nio pipeline

tcp.connection.in.priorityFramesRead

count

Number of priority frames read

tcp.connection.out.bytesWritten

bytes

Total amount of written frames on this outbound pipeline

tcp.connection.out.idleTimeMillis

ms

Idle time that indicates how long since the last write on this outbound nio pipeline

tcp.connection.out.normalFramesWritten

count

Number of written normal frames on this outbound nio pipeline

tcp.connection.out.priorityFramesWritten

count

Number of priority frames written into this nio pipeline

tcp.connection.out.priorityWriteQueuePendingBytes

bytes

Total size of priority frames pending in the write queue

tcp.connection.out.priorityWriteQueueSize

count

Number of priority frames pending in the write queue

tcp.connection.out.scheduled

count

Ordinal of enum state of this outbound pipeline: 0 → UNSCHEDULED, 1 → SCHEDULED, 2 → BLOCKED, 3 → RESCHEDULE

tcp.connection.out.writeQueuePendingBytes

bytes

Total size of normal frames pending in the write queue

tcp.connection.out.writeQueueSize

count

Number of normal frames pending in the write queue

tcp.inputThread/outputThread.bytesTransceived

bytes

Amount of transceived data on this NioThread

tcp.inputThread/outputThread.completedTaskCount

count

Total number of completed tasks on this NioThread

tcp.inputThread/outputThread.eventCount

count

Total number of the connections accepted by TcpServerAcceptor

tcp.inputThread/outputThread.framesTransceived

count

Number of transceived frames on this NioThread

tcp.inputThread/outputThread.idleTimeMillis

ms

Idle time that indicates the duration since the last read/write

tcp.inputThread/outputThread.ioThreadId

count

Thread ID of this NioThread

tcp.inputThread/outputThread.priorityFramesTransceived

count

Number of transceived priority frames

tcp.inputThread/outputThread.processCount

count

Number of processed `NioPipeline`s on this NioThread

tcp.inputThread/outputThread.selectorIOExceptionCount

count

Number of times that I/O exceptions are thrown during selection

tcp.inputThread/outputThread.taskQueueSize

count

Number of pending tasks on the queue of NioThread

Threads

Name

Unit

Description

thread.daemonThreadCount

count

Current number of live daemon thread in the JVM

thread.peakThreadCount

count

Peak live thread count since the JVM started

thread.threadCount

count

Current number of live threads including both daemon and non-daemon threads in the JVM

thread.totalStartedThreadCount

count

Total number of threads started since the JVM started

Transactions

Name

Unit

Description

transactions.commitCount

count

Number of committed transactions

transactions.rollbackCount

count

Number of rollbacked transactions

transactions.startCount

count

Number of started transactions

Tiered Store

Name

Unit

Description

tstore.device.freeSpace

bytes

Amount of free space in the device directory

tstore.device.maxSpace

bytes

Amount of total space in the device directory

tstore.device.usage

bytes

Amount of space in the device directory used by Hybrid Log files

tstore.device.usedSpace

bytes

Amount of used space in the device directory

tstore.hlog.allocation.per.page.avg

count

Average number of Hybrid Log allocations per page

tstore.hlog.allocation.per.page.max

count

Maximum number of Hybrid Log allocations per page

tstore.hlog.allocation.per.page.min

count

Minimum number of Hybrid Log allocations per page

tstore.hlog.allocation.size.avg

bytes

Average Hybrid Log allocation size

tstore.hlog.allocation.size.max

bytes

Maximum Hybrid Log allocation size

tstore.hlog.allocation.size.min

bytes

Minimum Hybrid Log allocation size

tstore.hlog.allocation.size.total

bytes

Total size of Hybrid Log allocations

tstore.hlog.allocation.stall.avg

ns

Average time spent on stalled allocation for Hybrid Log

tstore.hlog.allocation.stall.max

ns

Maximum time spent on stalled allocation for Hybrid Log

tstore.hlog.allocation.stall.min

ns

Minimum time spent on stalled allocation for Hybrid Log

tstore.hlog.allocation.stall.total

ns

Total time spent on stalled allocations for Hybrid Log

tstore.hlog.compaction.count

count

Number of finished Hybrid Log compactions (successful and failed)

tstore.hlog.compaction.failed.count

count

Number of failed Hybrid Log compactions

tstore.hlog.compaction.inProgress.count

count

Number of Hybrid Log compactions currently in progress

tstore.hlog.compaction.ioTime.total

ns

Time spent on I/O during compaction

tstore.hlog.compaction.queue.count

count

Number of Hybrid Log compactions currently waiting in the queue

tstore.hlog.compaction.queueTime.avg

ns

Average time for which Hybrid Log compaction has been waiting in the queue

tstore.hlog.compaction.queueTime.max

ns

Maximum time for which Hybrid Log compaction has been waiting in the queue

tstore.hlog.compaction.queueTime.min

ns

Minimum time for which Hybrid Log compaction has been waiting in the queue

tstore.hlog.compaction.queueTime.total

ns

Total time for which Hybrid Log compactions have been waiting in the queue

tstore.hlog.compaction.time.avg

ns

Average time for which Hybrid Log compaction has been executing

tstore.hlog.compaction.time.max

ns

Maximum time for which Hybrid Log compaction has been executing

tstore.hlog.compaction.time.min

ns

Minimum time for which Hybrid Log compaction has been executing

tstore.hlog.compaction.time.total

ns

Total time for which Hybrid Log compactions have been executing

tstore.hlog.length

bytes

Current size of the Hybrid Log

tstore.hlog.pageWriteDuration.avg

ns

Average time it took to write a page to the device

tstore.hlog.pageWriteDuration.max

ns

Maximum time it took to write a page to the device

tstore.hlog.pageWriteDuration.min

ns

Minimum time it took to write a page to the device

tstore.hlog.paging.frequency.avg

ns

Average time between consecutive Hybrid Log page allocations

tstore.hlog.paging.frequency.max

ns

Maximum time between consecutive Hybrid Log page allocations

tstore.hlog.paging.frequency.min

ns

Minimum time between consecutive Hybrid Log page allocations

tstore.hlog.readRecordDuration.avg

ns

Average time it took to read a record from the device

tstore.hlog.readRecordDuration.max

ns

Maximum time it took to read a record from the device

tstore.hlog.readRecordDuration.min

ns

Minimum time it took to read a record from the device

tstore.hlog.readRecord.hits

count

Number of times when requested record was in memory

tstore.hlog.readRecord.misses

count

Number of times when requested record was not in memory

tstore.hlog.readRecord.hit.percent

percent

Percent of times when requested record was in memory

tstore.hlog.readRecord.miss.percent

percent

Percent of times when requested record was not in memory

tstore.hlog.waste.alignment.avg

bytes

Average space wasted due to alignment of Hybrid Log allocation

tstore.hlog.waste.alignment.max

bytes

Maximum space wasted due to alignment of Hybrid Log allocation

tstore.hlog.waste.alignment.min

bytes

Minimum space wasted due to alignment of Hybrid Log allocation

tstore.hlog.waste.alignment.total

bytes

Total space wasted due to alignment of Hybrid Log allocations

tstore.hlog.waste.paging.avg

bytes

Average space wasted due to crossing page boundaries of Hybrid Log allocation

tstore.hlog.waste.paging.max

bytes

Maximum space wasted due to crossing page boundaries of Hybrid Log allocation

tstore.hlog.waste.paging.min

bytes

Minimum space wasted due to crossing page boundaries of Hybrid Log allocation

tstore.hlog.waste.paging.total

bytes

Total space wasted due to crossing page boundaries of Hybrid Log allocations

WAN Replication

Name

Unit

Description

wan.ackDelayCurrentMillis

ms

Duration of ongoing delaying, -1 if there is no current delaying

wan.ackDelayLastEnd

ms

Timestamp of the last end of delaying the acknowledgments; if this value is bigger than wan.ackDelayLastStart, then there is no delaying

wan.ackDelayLastStart

ms

Timestamp of the last start of delaying the acknowledgments

wan.ackDelayTotalCount

count

Total number of the triggering delaying the WAN acknowledgments (exceeding the invocation threshold)

wan.ackDelayTotalMillis

ms

Total amount of time delaying the WAN acknowledgments was taking place

wan.consistencyCheck.lastCheckedPartitionCount

count

Number of checked partitions on the last WAN consistency check

wan.consistencyCheck.lastCheckedLeafCount

count

Number of checked partitions on the last WAN consistency check

wan.consistencyCheck.lastDiffLeafCount

count

Number of different Merkle tree leaves on the last WAN consistency check

wan.consistencyCheck.lastDiffPartitionCount

count

Number of partitions found to be inconsistent on the last WAN consistency check

wan.consistencyCheck.lastEntriesToSync

count

Number of entries to synchronize to get the clusters into sync on the last WAN consistency check

wan.droppedCount

count

Number of dropped entry events

wan.outboundQueueSize

count

Outbound WAN queue size on this member

wan.removeCount

count

Number of entry remove events

wan.syncCount

count

Number of entry sync events

wan.sync.avgEntriesPerLeaf

count

Average of the number of records belong the synchronized Merkle tree nodes have

wan.sync.maxLeafEntryCount

count

Maximum of the number of records belong the synchronized Merkle tree nodes have

wan.sync.minLeafEntryCount

count

Minimum of the number of records belong the synchronized Merkle tree nodes have

wan.sync.nodesSynced

count

Number of the synchronized Merkle tree nodes

wan.sync.partitionsSynced

count

Number of synchronized partitions

wan.sync.partitionsToSync

count

Number of partitions to synchronize

wan.sync.recordsSynced

count

Number of synchronized records

wan.sync.syncDurationNanos

ns

Duration of the last synchronization

wan.sync.stdDevEntriesPerLeaf

count

Standard deviation of the number of records belong the synchronized Merkle tree nodes have

wan.sync.syncStartNanos

ns

Start time of this WAN synchronization

wan.totalPublishLatency

ms

Total latency of published WAN events from this member

wan.totalPublishedEventCount

count

Total number of published WAN events from this member

wan.updateCount

count

Number of entry update events

wan.connectionHealth

boolean

The health of an individual WAN target endpoint, where 1 is healthy and 0 is not

wan.failedTransmitCount

count

Number of attempted WAN replication transmissions that have failed

User Code Namespaces

Name

Unit

Description

ucn.updateTime

ms

Update time of a user code namespace configuration

ucn.resourceCount

count

Number of resources contained in a namespace

ucn.resource.resourceType

enum

The Ordinal of the enum type of the resource contained in a namespace

ucn.resource.resourceSizeBytes

bytes

The size in bytes of a resource contained in a namespace