This is a prerelease version.

View latest

Extract Transform Load (ETL)

ETL is a pipeline pattern for collecting data from various sources, transforming (changing) it to conform to some rules, and loading it into a sink. This pattern is often used to form materialized views where data is processed and put into a Hazelcast map for fast in-memory queries.

Extract

The first step is to ingest data into Hazelcast from one or more sources. The data can come from many sources, including files, databases, and streaming sources.

See Ingesting Data from Sources for details of Hazelcast sources.

Transform

When data is ingested into Hazelcast, it can be transformed for your use case. For example, you may want to aggregate the data for analysis or change it to fit the schema of the destination.

Here are some example methods for transforming data:

  • Filtering

  • De-duplicating

  • Validating

  • Aggregating

Performing these transformations in Hazelcast instead of the source limits the performance impact on the source systems and reduces the likelihood of data corruption.

See Transforms for more information.

Load

The last step in any ETL pipeline is to send the data to its final destination, which is often called a sink.

See Sending Results to Sinks for details of Hazelcast sinks.

An example of an ETL pipeline