Managing Jobs
Once a job is submitted, it has its own lifecycle on the cluster which is distinct from the submitter. To manage the lifecycle of jobs, you can use either SQL or CLI commands to list, cancel, suspend, resume, and restart them.
Listing Jobs
Use the list-jobs
command to get a list of all jobs running in the
cluster:
bin/hz-cli list-jobs
Example result:
ID STATUS SUBMISSION TIME NAME
0401-9f77-b9c0-0001 RUNNING 2020-03-07T15:59:49.234 hello-world
You can also list completed jobs by adding the -a
parameter:
bin/hz-cli list-jobs -a
Example result:
ID STATUS SUBMISSION TIME NAME
0402-de9d-35c0-0001 RUNNING 2020-03-08T15:14:11.439 hello-world-v2
0402-de21-7f00-0001 FAILED 2020-03-08T15:12:04.893 hello-world
SHOW JOBS;
Example result:
+--------------------+
|name |
+--------------------+
|hello-world |
+--------------------+
For more details about this statement, see the SQL reference documentation.
Canceling Jobs
Streaming jobs run indefinitely until canceled. To stop a job, you must cancel it.
bin/hz-cli cancel hello-world
Example result:
Canceling job id=0402-de21-7f00-0001, name=hello-world, submissionTime=2020-03-08T15:12:04.893
Job canceled.
When a job is canceled, the snapshot for the job is lost and the job
can’t be resumed. Canceled jobs have a failed
status.
DROP JOB IF EXISTS hello-world;
Result:
OK
When a job is canceled, the snapshot for the job is lost and the job
can’t be resumed. Canceled jobs have a failed
status.
To save a snapshot of the job, use the WITH SNAPSHOT
clause.
For more details about this statement, see the SQL reference documentation.
Suspending and Resuming Jobs
Suspending and resuming jobs can be useful for example when you need to perform maintenance on a data source or sink without disrupting a running job.
When a job is suspended, all the metadata about the job is kept in the cluster. A snapshot of the job’s computational state is taken during a suspend operation and then once resumed, the job is gracefully started from the same snapshot.
To suspend and resume a job, it must be configured with a processing guarantee. To learn more about setting a processing guarantee, see Configuring Jobs. |
Use the suspend <job_name_or_id>
and resume <job_name_or_id>
commands to suspend and resume jobs:
bin/hz-cli suspend hello-world
Example result:
Suspending job id=0401-9f77-b9c0-0001, name=hello-world, submissionTime=2020-03-07T15:59:49.234...
Job suspended.
bin/hz-cli resume hello-world
Example result:
Resuming job id=0401-9f77-b9c0-0001, name=hello-world, submissionTime=2020-03-07T15:59:49.234...
Job resumed.
To configure a job to be suspended automatically if its execution fails, see JobConfig.setSuspendOnFailure.
Use the ALTER JOB statement to suspend and resume jobs:
ALTER JOB hello-world SUSPEND;
Result:
OK
ALTER JOB hello-world RESUME;
For more details about this statement, see the SQL reference documentation.
Restarting Jobs
Restarting a job allows you to suspend and resume it in one step. This can be useful when you want to have control over when the job should be scaled. For example, if a job’s auto-scaling
option is disabled and you add 3 nodes to a cluster you can manually restart the job at the desired point to make sure that all the new nodes can run it.
bin/hz-cli restart hello-world
Example result:
Restarting job id=0401-9f77-b9c0-0001, name=hello-world, submissionTime=2020-03-07T15:59:49.234...
ALTER JOB hello-world RESTART;
Result:
OK
For more details about this statement, see the SQL reference documentation.