Hazelcast Simulator is a high-performance, production-grade testing framework for running performance, stress, and latency tests on Hazelcast clusters. It allows you to simulate complex workloads and evaluate distributed systems in a realistic and reproducible manner.
Hazelcast Simulator is designed for:
-
Validating new features.
-
Detecting regressions.
-
Measuring throughput and latency under varying loads.
-
Simulating real-world failures (such as network latency and node crashes).
-
Running tests against cloud-deployed Hazelcast clusters.
Key capabilities
-
Supports both throughput and latency-oriented testing modes.
-
Orchestrates tests across multiple clients and members, with configurable topologies.
-
Provides out-of-the-box support for static infrastructure.
-
Support for automatic provisioning of cloud infrastructure (currently for AWS).
-
Integrates with monitoring tools and profilers for system-level analysis.
-
Enables custom test logic with a flexible Java-based test API.
Supported test types
Simulator enables the execution of a variety of performance test types:
-
Load Tests: Combined throughput and latency under realistic load.
-
Max Throughput Tests: Identify system saturation points under increasing concurrency.
-
Latency Tests: Measure operation response time at fixed request rates.
-
Spike and Soak Tests: Evaluate short bursts and long-term stability.
-
Stress Tests: Push the system until failure to observe limits.
Performance testing strategy
Before executing tests:
-
Use a test plan to define test scope, goals, and configurations.
-
Choose appropriate machine types, network configuration, and topology.
-
Consider persistence, CPU, memory, and network bandwidth requirements.
Executing tests:
-
Start with a small cluster and progressively scale load by adjusting parameters such as
threadCount
andratePerSecond
. -
Continuously monitor throughput (
TPS
), latency percentiles, CPU utilization, memory consumption, and other relevant system metrics. -
Use Hazelcast Simulator’s latency measurement and ramp-up utilities to apply load in a controlled and reproducible manner.
After executing tests:
-
Analyze the collected metrics and logs to detect bottlenecks, stability issues, and scaling thresholds.
-
Compare observed results against the goals defined in the test plan and identify bottlenecks or deviations from the planned outcomes.
-
Conclude by shutting down or cleaning up all cluster instances and related resources to avoid interference with subsequent runs.
Advanced features
-
Network Latency Simulation: Inject delays between groups of machines.
-
CP Subsystem Testing: Configure CP member priorities using
cp_priorities
. -
Flight Recorder Integration: Enable JFR for profiling with
member_args
. -
Warmup/Cooldown: Configure warmup and cooldown periods to improve report accuracy.