Configuring Jobs

Before submitting a job to a cluster, you can configure its name, whether to record metrics, how it behaves when cluster members fail, and more.

Job Configuration Options

These are the available configuration options for jobs.

Option Usage Type Default Value

processingGuarantee

Set the strategy for taking snapshots of running jobs.

Snapshots use additional in-memory storage.

NONE

AT_LEAST_ONCE

EXACTLY_ONCE

Depends on the source or sink.

snapshotIntervalMillis

Set the interval in milliseconds between each snapshot. This interval is only relevant if the processingGuarantee setting is not NONE.

positive INT

10000

autoScaling

Enable jobs to scale automatically when new members are added to the cluster or existing members are removed from the cluster.

boolean

true

splitBrainProtectionEnabled

In case of a network partition, enable a job to continue running only on the cluster quorum.

boolean

false

suspendOnFailure

Sets what happens when a job execution fails.

If enabled, the job will be suspended on failure. A snapshot of the job’s computational state will be preserved. You can update the configuration of a suspended job and resume it programmatically or using SQL.
If disabled, the job will be terminated and the job snapshots will be deleted.

boolean

true

initialSnapshotName

Name the snapshot from which to start the job.

string

''

userCodeNamespace

The User Code Namespace to associate these jobs with for access to custom resources, or null if not associated with one. User Code Namespaces must be enabled and the provided namespace configured.

string

null

In addition to the above, there are also options to configure the behavior of retrieving job metrics; see Accessing Job Metrics.

Setting the Job Name

Each job has a cluster-wide unique ID and an optional name. Only one job with the same name can be running in the cluster at the same time.

Java
SQL

JobConfig.setName('myJob');

When a job is already running with the same name, the newly submitted one will fail. You can avoid this by using the JetService.newJobIfAbsent() method.

CREATE JOB myJob

Setting a Processing Guarantee for Streaming Jobs

When you set a processing guarantee that isn’t NONE, Hazelcast takes distributed snapshots of your jobs. The distributed snapshot algorithm works by sending barriers down the event stream which upon receiving causes the Jet processors to save their state as a snapshot. Snapshots are saved in-memory in map backups and replicated across the cluster.

For details of the distributed snapshot algorithm, see Distributed Snapshot

Since a processor can have multiple inputs, it must wait until the barrier is received from all inputs before taking a snapshot. The difference between AT_LEAST_ONCE and EXACTLY_ONCE lies in how the processor handles the barrier. With the AT_LEAST_ONCE option, the processor can continue to process items from inputs which have already received the barrier. This strategy results in lower latency and higher throughput overall, with the caveat that some items may be processed twice after a restart:

NONE: No snapshots will be taken and upon a restart due to cluster change, the job will be restarted as if it was started from scratch.
AT_LEAST_ONCE: Enables snapshots. When a job is restarted it will be resumed from the latest available snapshot. Items which have been processed before the snapshot might be processed again after the job is resumed. This option provides better latency than the EXACTLY_ONCE option with weaker guarantees.
EXACTLY_ONCE: Enables snapshots. When a job is restarted it will be resumed from the latest available snapshot. Items which have been processed before the snapshot are guaranteed not to be processed again after the job is resumed. This option provides the strongest correctness guarantee. However, latency might increase due to the aligning of barriers which are required in this processing mode.

For example, while a cluster is running a job, one or more members may leave the cluster due to an internal error, loss of networking, or deliberate shutdown for maintenance. In this case, Hazelcast must suspend the computation, re-plan it for the smaller cluster, and then resume in such a way that the state of computation remains intact.

Java
SQL

JobConfig.setProcessingGuarantee(ProcessingGuarantee.EXACTLY_ONCE);

CREATE JOB myJob
OPTIONS (
    'processingGuarantee' = 'exactlyOnce',
)

Upholding a processing guarantee requires support from all the participants in the pipeline, including data sources and sinks. For example, to support the at-least-once processing guarantee, all sinks in a pipeline must be idempotent, allowing duplicate submission of the same data item without resulting in duplicate computations. For an in-depth explanation of this topic, see Fault Tolerance. To find out if a source supports processing guarantees, see Ingesting Data from Sources. To find out if a sink supports processing guarantees, see Sending Results to Sinks.

Auto-Scaling Jobs

By default, jobs scale up and down automatically when you add or remove a cluster node. To rescale a job, Hazelcast must restart it.

For an in-depth explanation of fault tolerance for jobs, see Fault Tolerance.

When auto-scaling is off and you add a new node to a cluster, the job will keep running on the previous nodes but not on the new one. However, if the job restarts for whatever reason, Hazelcast will automatically scale it to the whole cluster.

The exact behavior of what happens when a node joins or leaves depends on whether a job is configured with a processing guarantee and with auto-scaling. The table below shows the behavior of a job after a cluster change depending on these two settings.

Auto-Scaling	Processing Guarantee	Member Added	Member Removed
enabled (default)	any setting	restart after a configured delay (scale-up-delay-millis)	restart immediately
disabled	none	keep job running on old members	fail job
disabled	at-least-once or exactly-once	keep job running on old members	suspend job

Auto-Scaling

Processing Guarantee

Member Added

Member Removed

enabled (default)

any setting

restart after a configured delay (scale-up-delay-millis)

restart immediately

disabled

none

keep job running on old members

fail job

disabled

at-least-once or exactly-once

keep job running on old members

suspend job

Configure a User Code Namespace

Normal Jet jobs can be configured to use a User Code Namespace (UCN), which provides them access to custom resources associated with that UCN. User Code Namespaces must be enabled and a namespace configured with the desired resources created before it can be used in jobs. Any data structures used in these jobs should also have their UCN configured to ensure operations for that data structure also have access to the necessary custom resources. Light jobs do not support UCN.

User Code Namespaces are managed separately to the Jet system, so these resources persist between individual jobs. This support allows for jobs to be submitted and updated while still being able to use the same custom resources within pipelines.

User Code Namespaces can be dynamically updated at runtime, but it is recommended that namespaces associated with Jet jobs are not changed while those jobs are running. Doing so may cause job execution inconsistency between member nodes if jobs restart during the namespace update. If all related jobs are suspended, then the associated namespace can be safely updated before resuming jobs.

The following is an example of how to associate a namespace with a job. The namespace must already be configured in the Hazelcast member configuration.

Java
SQL

JobConfig jobConfig = new JobConfig();
jobConfig.setUserCodeNamespace("my_namespace");

Definition of a UCN is not supported via SQL.

Configuring Jobs

Job Configuration Options

Setting the Job Name

Setting a Processing Guarantee for Streaming Jobs

Auto-Scaling Jobs

Configure a User Code Namespace

Send us your feedback

Help and support