This is a prerelease version.

View latest

Running Data Pipelines

Data Pipelines allow you to process data stored in one location and send the result to another, such as from a data lake to an analytics database or into a payment processing system. You can also use the same source and sink so the pipeline only processes data.

With the Hazelcast Platform Operator, you can run Data Pipelines from existing JAR files for processing data. Data Pipelines, depending on the data source, can be used for stream or batch processing. To create a Data Pipeline using JetJob CR, the Jet Engine in the Hazelcast CR must be configured. It is required to set enable and resourceUploadEnabled to true.

To understand the Data Pipelines and the Jet Engine, refer to Platform documentation.

For a worked example, see the Run a data pipeline using Jet tutorial.

Configuring the JetJob Resource

Below are the configuration options for the JetJob resource. You can find more detailed information in API Reference page.

Field Description

name

Name of the Jet Job to be created. If empty, the CR name will be used. It cannot be updated after the Jet Job is created successfully.

hazelcastResourceName

HazelcastResourceName defines the name of the Hazelcast resource.

state

State is used to manage the job state. The default value is 'Running' and its value must be Running when the JetJob object is created for the first time.

jarName

JarName specify the name of the Jar to run that is present on the member

mainClass

MainClass is the name of the main class that will be run on the submitted job

bucketConfig

JAR file that is specified in the jarName from an external bucket are placed accessible for the member when the following parameter values are supplied:

  • secretName: Name of the Secret object which holds the credentials for your cloud provider.

  • bucketURI: Full path for the external bucket. For example: gs://your-bucket/path/to/jars.

remoteURL

URL from where the file will be downloaded.

URL parameters

The bucketURI parameter can be used for additional configuration. For example, to specify an AWS S3 URL to send requests to, you can use the endpoint parameter: s3://my-bucket?endpoint=my.endpoint.url&disableSSL=true&s3ForcePathStyle=true.

The following options are supported:

Field Description

region

The AWS region for requests

endpoint

The endpoint URL (hostname only or fully qualified URI)

disable_ssl, or disableSSL

A value of true disables SSL when sending requests

s3_force_path_style, s3ForcePathStyle

A value of true forces the request to use path-style addressing

dualstack

A value of true enables dual stack (IPv4 and IPv6) endpoints

fips

A value of true enables the use of FIPS endpoints

Providing the JAR file for the Data Pipeline

To run the Data Pipeline, you need to provide a JAR file that contains the Pipeline. The JAR file can be pre-downloaded before the cluster starts by configuring the jet.bucketConfig, jet.remoteURLs, or jet.configMaps in the Hazelcast CR. This way, all the files in the bucket will be accessible to the member when the cluster starts. Another option is to configure bucketConfig or remoteURL in the JetJob CR. This way, only the JAR file specified in the jarName parameter will be downloaded in the runtime before starting the Data Pipeline.

JetJob state management

Once the job is created, you can use state field to manage its lifecycle. The following state values are available:

  • Running. All the jobs must be created with the Running state. It will run the newly created job or will start the Suspended job.

  • Suspended. Gracefully suspends the Running job.

  • Canceled. Gracefully stops the job.

  • Restarted. Suspends and resumes the job in one step.

Deleting the JetJob resource will forcefully cancel the job.

Example Configuration

The following JetJob resource runs the Data Pipeline for the Hazelcast resources on the source Hazelcast cluster from my-data-pipeline.jar.

Example configuration
apiVersion: hazelcast.com/v1alpha1
kind: Hazelcast
metadata:
  name: hazelcast
spec:
  clusterSize: 3
  repository: 'docker.io/hazelcast/hazelcast-enterprise'
  jet:
    enabled: true
    resourceUploadEnabled: true
    bucketConfig:
      secretName: br-secret-gcp
      bucketURI: "gs://your-bucket/path/to/jars"
  licenseKeySecretName: hazelcast-license-key
---
apiVersion: hazelcast.com/v1alpha1
kind: JetJob
metadata:
  name: jet-job-sample
spec:
  name: my-test-jet-job
  hazelcastResourceName: hazelcast
  state: Running
  jarName: my-data-pipeline.jar
For further information about accessing resources on different cloud providers, see Authorization Methods to Access Cloud Provider Resources.