Performance

Running Flow in production and at scale.

Minimum specifications

We recommend running Flow on a machine with at least 8 CPU cores and 8 GiB of memory. This is based on the minimum specifications for running the underlying Hazelcast cluster.
For further details on this please read the Hazelcast Performance Tips.

For production resilience, please ensure you’re running multiple Flow instances to ensure continued uptime.

Flow is efficient at utilizing spare resources so scales very effectively as you scale the underlying hardware to have greater available resources.

Doubling the available resources can roughly half your processing latency and increase your overall throughput.

Multiple instances

Flow should be run across multiple instances in your production environments. We recommend at least 3 instances to provide job resilience and maintain the quorum in the Hazelcast cluster.

This will provide you with redundancy in case of node failure and allow multiple jobs to be spread across the nodes in the cluster.

Flow can be scaled to run on extra nodes as you execute more jobs that need resources to run in parallel.

For more detail on templates to configure Flow with multiple nodes, see Deploy Flow.

Tuning

When running Flow in production, ensure you’ve configured an external Prometheus server to capture your metrics, which will enable Flow to graph them on the Endpoint page. From there you can see the throughput of your query and the latency of each message processed.

perf metrics

The throughput of Flow will be highly dependent on the complexity of your queries and external models that the system is having to make requests to. A query running a basic streaming process with minimal lookups should be able to scale until exhausting your available processing resources. A query with lots of external lookups and connections is more likely to be bound by IO time, so monitor your query durations closely.

If your external lookups are cacheable, consider using Flow’s native caching annotations to save making the same API requests repeatedly and retrieve the data from local memory instead.