Pipelining

Here are the essential tips:

  • If you can write the job as SQL, the query optimizer can take care of it

  • If you have to write a pipeline, it can be very hard to spot optimizations

  • Focus on early filtering and early depletion

With pipelining, you can send multiple requests in parallel using a single thread and therefore can increase throughput. As an example, suppose that the round trip time for a request/response is 1 millisecond. If synchronous requests are used, for example IMap.get(), then the maximum throughput out of these requests from a single thread is 1/001 = 1000 operations/second. One way to solve this problem is to introduce multithreading to make the requests in parallel. For the same example, if we use two threads, then the maximum throughput doubles from 1000 operations/second, to 2000 operations/second.

However, introducing threads for the sake of executing requests isn’t always convenient and doesn’t always lead to an optimal performance; this is where pipelining can be used. Instead of using multiple threads to have concurrent invocations, you can use asynchronous method calls such as IMap.getAsync(). If you would use two asynchronous calls from a single thread, then the maximum throughput is 2*(1/001) = 2000 operations/second. Therefore, to benefit from the pipelining, asynchronous calls need to be made from a single thread. The pipelining is a convenience implementation to provide back pressure - that is, controlling the number of inflight operations - and provides a convenient way to wait for all the results.

Pipelining<String> pipelining = new Pipelining<String>(10);
for (long k = 0; k < 100; k++) {
    int key = random.nextInt(keyDomain);
    pipelining.add(map.getAsync(key));
}
// wait for completion
List<String> results = pipelining.results();

In the above example, we make 100 asynchronous map.getAsync() calls, but the maximum number of inflight calls is 10.

By increasing the depth of the pipelining, throughput can be increased. The pipelining has its own back pressure, you do not need to enable the back pressure on the client or member to have this feature on the pipelining. However, if you have many pipelines, you may still need to enable the client/member back pressure because it is possible to overwhelm the system with requests in that situation. See the Back Pressure section to learn how to enable it on the client or member.

You can use the pipelining both on the clients and members. You do not need a special configuration, it works out-of-the-box.

The pipelining can be used for any asynchronous call. You can use it for IMap asynchronous get/put methods as well as for ICache, IAtomicLong, etc. It cannot be used as a transaction mechanism though. So you cannot do some calls and throw away the pipeline and expect that none of the requests are executed. If you want to use an atomic behavior, see Transactions for more details. The pipelining is just a performance optimization, not a mechanism for atomic behavior.

Deprecation Notice for Transactions

Transactions have been deprecated, and will be removed as of Hazelcast version 7.0. An improved version of this feature is under consideration. If you are already using transactions, get in touch and share your use case. Your feedback will help us to develop a solution that meets your needs.

The pipelines are cheap and should frequently be replaced because they accumulate results. It is fine to have a few hundred or even a few thousand calls being processed with the pipelining. However, all the responses to all requests are stored in the pipeline as long as the pipeline is referenced. So if you want to process a huge number of requests, then every few hundred or few thousand calls wait for the pipelining results and just create a new instance.

Note that the pipelines are not thread-safe. They must be used by a single thread.