Ingesting Data from Sources
In order to process data, you first need to ingest it into your pipeline, using one or more sources.
A source is a bridge between a local or remote data system and your Hazelcast cluster, which allows you to ingest data into Hazelcast for processing or storage.
Hazelcast comes with many built-in sources, which we refer to as connectors.
In Hazelcast, each source defines which type of processing it supports:
Stream source: Reads from an infinite source that never ends.
Batch source: Reads from a finite source.
Stream sources can have either an at-least-once or exactly-once processing guarantee in case of a member failure. But, batch sources do not support these guarantees, instead pipelines that use these sources should just be restarted in case of a member failure.
If a pipeline pulls data from multiple sources, you can use both stream sources and batch sources in the same pipeline. And, you can use the following joining transforms to merge the branches:
For example, you can merge data from two sources into one by using the
Pipeline pipeline = Pipeline.create(); BatchSource<String> leftSource = TestSources.items("the", "quick", "brown", "fox"); BatchSource<String> rightSource = TestSources.items("jumps", "over", "the", "lazy", "dog"); BatchStage<String> left = pipeline.readFrom(leftSource); BatchStage<String> right = pipeline.readFrom(rightSource); left.merge(right) .writeTo(Sinks.logger());