Managing Jobs
Once a job is submitted, it has its own lifecycle on the cluster which is distinct from the submitter. To manage the lifecycle of jobs, you can use either SQL or CLI commands to list, cancel, suspend, resume, and restart them.
Listing Jobs
Use the list-jobs
command to get a list of all jobs running in the
cluster:
Example result:
You can also list completed jobs by adding the -a
parameter:
Example result:
Example result:
For more details about this statement, see the SQL reference documentation.
Canceling Jobs
Streaming jobs run indefinitely until canceled. To stop a job, you must cancel it.
Example result:
When a job is canceled, the snapshot for the job is lost and the job
can’t be resumed. Canceled jobs have a failed
status.
Result:
When a job is canceled, the snapshot for the job is lost and the job
can’t be resumed. Canceled jobs have a failed
status.
To save a snapshot of the job, use the WITH SNAPSHOT
clause.
For more details about this statement, see the SQL reference documentation.
Suspending and Resuming Jobs
Suspending and resuming jobs can be useful for example when you need to perform maintenance on a data source or sink without disrupting a running job.
When a job is suspended, all the metadata about the job is kept in the cluster. A snapshot of the job’s computational state is taken during a suspend operation and then once resumed, the job is gracefully started from the same snapshot.
To suspend and resume a job, it must be configured with a processing guarantee. To learn more about setting a processing guarantee, see Configuring Jobs. |
Use the suspend <job_name_or_id>
and resume <job_name_or_id>
commands to suspend and resume jobs:
Example result:
Example result:
To configure a job to be suspended automatically if its execution fails, see JobConfig.setSuspendOnFailure.
Use the ALTER JOB statement to suspend and resume jobs:
Result:
For more details about this statement, see the SQL reference documentation.
Restarting Jobs
Restarting a job allows you to suspend and resume it in one step. This can be useful when you want to have control over when the job should be scaled. For example, if a job’s auto-scaling
option is disabled and you add 3 nodes to a cluster you can manually restart the job at the desired point to make sure that all the new nodes can run it.
Example result:
Result:
For more details about this statement, see the SQL reference documentation.