A newer version of IMDG is available.

View latest

Want to try Hazelcast Platform?

We’ve combined the in-memory storage of IMDG with the stream processing power of Jet to bring you the all new Hazelcast Platform.

How Distributed Query Works

The requested predicate is sent to each member in the cluster. Each member looks at its own local entries and filters them according to the predicate. At this stage, key/value pairs of the entries are deserialized and then passed to the predicate. The predicate requester merges all the results coming from each member into a single set.

Distributed query is highly scalable. If you add new members to the cluster, the partition count for each member is reduced and thus the time spent by each member on iterating its entries is reduced. In addition, the pool of partition threads evaluates the entries concurrently in each member and the network traffic is also reduced since only filtered data is sent to the requester.

Hazelcast offers the following APIs for distributed query purposes:

  • Criteria API

  • Distributed SQL Query

Employee Map Query Example

Assume that you have an "employee" map containing values of Employee objects, as coded below.

public class Employee implements Serializable {
    private String name;
    private int age;
    private boolean active;
    private double salary;

    public Employee(String name, int age, boolean active, double salary) {
        this.name = name;
        this.age = age;
        this.active = active;
        this.salary = salary;
    }

    public Employee() {
    }

    public String getName() {
        return name;
    }

    public int getAge() {
        return age;
    }

    public double getSalary() {
        return salary;
    }

    public boolean isActive() {
        return active;
    }
}

Now let’s look for the employees who are active and have an age less than 30 using the aforementioned APIs (Criteria API and Distributed SQL Query). The following subsections describe each query mechanism for this example.

When using Portable objects, if one field of an object exists on one member but does not exist on another one, Hazelcast does not throw an unknown field exception. Instead, Hazelcast treats that predicate, which tries to perform a query on an unknown field, as an always false predicate.

Querying with Criteria API

Criteria API is a programming interface offered by Hazelcast that is similar to the Java Persistence Query Language (JPQL). Below is the code for the above example query.

IMap<String, Employee> map = hazelcastInstance.getMap( "employee" );

EntryObject e = Predicates.newPredicateBuilder().getEntryObject();
Predicate predicate = e.is( "active" ).and( e.get( "age" ).lessThan( 30 ) );

Collection<Employee> employees = map.values( predicate );

In the above example code, predicate verifies whether the entry is active and its age value is less than 30. This predicate is applied to the employee map using the map.values(predicate) method. This method sends the predicate to all cluster members and merges the results coming from them. Since the predicate is communicated between the members, it needs to be serializable.

Predicates can also be applied to keySet, entrySet and localKeySet of the Hazelcast distributed map.

Predicates Class Operators

The Predicates class includes many operators for your query requirements. The following are descriptions for some of them:

  • equal: Checks if the result of an expression is equal to a given value.

  • notEqual: Checks if the result of an expression is not equal to a given value.

  • instanceOf: Checks if the result of an expression has a certain type.

  • like: Checks if the result of an expression matches some string pattern. % (percentage sign) is the placeholder for many characters, (underscore) is placeholder for only one character.

  • ilike: A case-insensitive variant of like.

  • greaterThan: Checks if the result of an expression is greater than a certain value.

  • greaterEqual: Checks if the result of an expression is greater than or equal to a certain value.

  • lessThan: Checks if the result of an expression is less than a certain value.

  • lessEqual: Checks if the result of an expression is less than or equal to a certain value.

  • between: Checks if the result of an expression is between two values (this is inclusive).

  • in: Checks if the result of an expression is an element of a certain collection.

  • isNot: Checks if the result of an expression is false.

  • regex: Checks if the result of an expression matches some regular expression.

  • alwaysTrue: The result of an expression always matches.

  • alwaysFalse: The result of an expression ever matches.

See the Predicates Javadoc for all predicates provided.

Combining Predicates with AND, OR, NOT

You can combine predicates using the and, or and not operators, as shown in the below examples.

public Collection<Employee> getWithNameAndAge( String name, int age ) {
    Predicate namePredicate = Predicates.equal( "name", name );
    Predicate agePredicate = Predicates.equal( "age", age );
    Predicate predicate = Predicates.and( namePredicate, agePredicate );
    return employeeMap.values( predicate );
}
public Collection<Employee> getWithNameOrAge( String name, int age ) {
    Predicate namePredicate = Predicates.equal( "name", name );
    Predicate agePredicate = Predicates.equal( "age", age );
    Predicate predicate = Predicates.or( namePredicate, agePredicate );
    return employeeMap.values( predicate );
}
public Collection<Employee> getNotWithName( String name ) {
    Predicate namePredicate = Predicates.equal( "name", name );
    Predicate predicate = Predicates.not( namePredicate );
    return employeeMap.values( predicate );
}

Simplifying with PredicateBuilder

You can simplify predicate usage with the PredicateBuilder interface, which offers simpler predicate building. See the below example code which selects all people with a certain name and age.

public Collection<Employee> getWithNameAndAgeSimplified( String name, int age ) {
    EntryObject e = Predicates.newPredicateBuilder().getEntryObject();
    Predicate agePredicate = e.get( "age" ).equal( age );
    Predicate predicate = e.get( "name" ).equal( name ).and( agePredicate );
    return employeeMap.values( predicate );
}

Querying with SQL

Predicates.sql() takes the regular SQL where clause. Here is an example:

IMap<String, Employee> map = hazelcastInstance.getMap( "employee" );
Set<Employee> employees = map.values( Predicates.sql( "active AND age < 30" ) );
Hazelcast offers an SQL service that allows to execute SQL queries, as opposed to SQL-like predicates in case of Predicates.sql(). See the SQL chapter for more information.

Supported SQL Syntax

AND/OR: `<expression> AND <expression> AND <expression>…​ `

  • active AND age>30

  • active=false OR age = 45 OR name = 'Joe'

  • active AND ( age > 20 OR salary < 60000 )

Equality: =, !=, <, ⇐, >, >=

  • <expression> = value

  • age ⇐ 30

  • name = 'Joe'

  • salary != 50000

BETWEEN: <attribute> [NOT] BETWEEN <value1> AND <value2>

  • age BETWEEN 20 AND 33 ( same as age >= 20 AND age ⇐ 33 )

  • age NOT BETWEEN 30 AND 40 ( same as age < 30 OR age > 40 )

IN: <attribute> [NOT] IN (val1, val2,…​)

  • age IN ( 20, 30, 40 )

  • age NOT IN ( 60, 70 )

  • active AND ( salary >= 50000 OR ( age NOT BETWEEN 20 AND 30 ) )

  • age IN ( 20, 30, 40 ) AND salary BETWEEN ( 50000, 80000 )

LIKE: <attribute> [NOT] LIKE "expression"

The % (percentage sign) is placeholder for multiple characters, an _ (underscore) is placeholder for only one character.

  • name LIKE 'Jo%' (true for 'Joe', 'Josh', 'Joseph' etc.)

  • name LIKE 'Jo_' (true for 'Joe'; false for 'Josh')

  • name NOT LIKE 'Jo_' (true for 'Josh'; false for 'Joe')

  • name LIKE 'J_s%' (true for 'Josh', 'Joseph'; false 'John', 'Joe')

ILIKE: <attribute> [NOT] ILIKE 'expression'

Similar to LIKE predicate but in a case-insensitive manner.

  • name ILIKE 'Jo%' (true for 'Joe', 'joe', 'jOe','Josh','joSH', etc.)

  • name ILIKE 'Jo_' (true for 'Joe' or 'jOE'; false for 'Josh')

REGEX: <attribute> [NOT] REGEX 'expression'

  • name REGEX 'abc-.*' (true for 'abc-123'; false for 'abx-123')

You can escape the % and _ placeholder characters in your SQL queries with predicates using the backslash (\) character. The apostrophe (') can be escaped with another apostrophe, i.e., ''. If you use REGEX, you need to escape characters according to the normal Java escape syntax; see here for the details.

Querying Entry Keys with Predicates

You can use __key attribute to perform a predicated search for entry keys. See the following example:

IMap<String, Person> personMap = hazelcastInstance.getMap(persons);
personMap.put("Alice", new Person("Alice", 35, Gender.FEMALE));
personMap.put("Andy",  new Person("Andy",  37, Gender.MALE));
personMap.put("Bob",   new Person("Bob",   22, Gender.MALE));
[...]
Predicate predicate = Predicates.sql("__key like A%");
Collection<Person> startingWithA = personMap.values(predicate);

In this example, the code creates a collection with the entries whose keys start with the letter "A”.

Querying JSON Strings

You can query JSON strings stored inside your Hazelcast clusters. To query a JSON string, you first need to create a HazelcastJsonValue from the JSON string. You can use HazelcastJsonValues both as keys and values in the distributed data structures. Then, it is possible to query these objects using the Hazelcast query methods explained in this section.

String person1 = "{ \"name\": \"John\", \"age\": 35 }";
String person2 = "{ \"name\": \"Jane\", \"age\": 24 }";
String person3 = "{ \"name\": \"Trey\", \"age\": 17 }";

IMap<Integer, HazelcastJsonValue> idPersonMap = instance.getMap("jsonValues");

idPersonMap.put(1, new HazelcastJsonValue(person1));
idPersonMap.put(2, new HazelcastJsonValue(person2));
idPersonMap.put(3, new HazelcastJsonValue(person3));

Collection<HazelcastJsonValue> peopleUnder21 = idPersonMap.values(Predicates.lessThan("age", 21));

When running the queries, Hazelcast treats values extracted from the JSON documents as Java types so they can be compared with the query attribute. JSON specification defines five primitive types to be used in the JSON documents: number,string, true, false and null. The string, true/false and null types are treated as String, boolean and null, respectively. We treat the extracted number values as longs if they can be represented by a long. Otherwise, numbers are treated as doubles.

It is possible to query nested attributes and arrays in JSON documents. The query syntax is the same as querying other Hazelcast objects as explained in the Querying in Collections and Arrays section.

/**
 * Sample JSON object
 *
 * {
 *     "departmentId": 1,
 *     "room": "alpha",
 *     "people": [
 *         {
 *             "name": "Peter",
 *             "age": 26,
 *             "salary": 50000
 *         },
 *         {
 *             "name": "Jonah",
 *             "age": 50,
 *             "salary": 140000
 *         }
 *     ]
 * }
 *
 *
 * The following query finds all the departments that have a person named "Peter" working in them.
 */
Collection<HazelcastJsonValue> departmentWithPeter = departments.values(Predicates.equal("people[any].name", "Peter"));

HazelcastJsonValue is a lightweight wrapper around your JSON strings. It is used merely as a way to indicate that the contained string should be treated as a valid JSON value. Hazelcast does not check the validity of JSON strings put into to maps. Putting an invalid JSON string in a map is permissible. However, in that case whether such an entry is going to be returned or not from a query is not defined.

Metadata Creation for JSON Querying

Hazelcast stores a metadata object per HazelcastJsonValue stored. This metadata object is created every time a HazelcastJsonValue is put into an IMap. Metadata is later used to speed up the query operations. Metadata creation is on by default. Depending on your application’s needs, you may want to turn off the metadata creation to decrease the put latency and increase the throughput. You can configure this using Metadata Policy.

JSON metadata is stored on-heap even when you use the NATIVE in-memory format. If you are storing HazelcastJsonValues in your NATIVE maps, there is a certain amount of on-heap cost per object. Metadata is not created unless you put HazelcastJsonValues in your NATIVE maps even when metadata creation is on.

Filtering with Paging Predicates

Hazelcast provides paging for defined predicates. With its PagingPredicate interface, you can get a collection of keys, values, or entries page by page by filtering them with predicates and giving the size of the pages. Also, you can sort the entries by specifying comparators. In this case, the comparator should be Serializable and the serialization factory implementations you use, e.g., PortableFactory and DataSerializableFactory, should be registered. See the Serialization chapter on how to register these factories.

Paging predicates require the objects to be deserialized both on the calling side (either a member or client) and the member side from which the collection is retrieved. Therefore, you need to register the serialization factories you use on all the members and clients on which the paging predicates are used. See the Serialization chapter on how to register these factories.

In the example code below:

  • The greaterEqual predicate gets values from the "students" map. This predicate has a filter to retrieve the objects with an "age" greater than or equal to 18.

  • Then a PagingPredicate is constructed in which the page size is 5, so that there are five objects in each page. The first time the values() method is called, the first page is fetched.

  • Finally, the subsequent page is fetched by calling the nextPage() method of PagingPredicate and querying the map again with the updated PagingPredicate.

IMap<Integer, Student> map = hazelcastInstance.getMap( "students" );
Predicate greaterEqual = Predicates.greaterEqual( "age", 18 );
PagingPredicate pagingPredicate = Predicates.pagingPredicate( greaterEqual, 5 );
// Retrieve the first page
Collection<Student> values = map.values( pagingPredicate );
...
// Set up next page
pagingPredicate.nextPage();
// Retrieve next page
values = map.values( pagingPredicate );
...

If a comparator is not specified for PagingPredicate, but you want to get a collection of keys or values page by page, keys or values must be instances of Comparable (i.e., they must implement java.lang.Comparable). Otherwise, the java.lang.IllegalArgument exception is thrown.

You can also access a specific page more easily with the help of the setPage() method. This way, if you make a query for the hundredth page, for example, it gets all 100 pages at once instead of reaching the hundredth page one by one using the nextPage() method. Note that this feature tires the memory and see the PagingPredicate Javadoc.

Paging Predicate, also known as Order & Limit, is not supported in Transactional Context.

Filtering with Partition Predicate

You can run queries on a single partition in your cluster using the partition predicate (PartitionPredicate).

The Predicates.partitionPredicate() method takes a predicate and partition key as parameters, gets the partition ID using the key and runs that predicate only on the partition where that key belongs.

See the following code snippet:

...
Predicate predicate = Predicates.partitionPredicate(partitionKey, Predicates.alwaysTrue());

Collection<Integer> values = map.values(predicate);
Collection<String> keys = map.keySet(predicate);
...

By default there are 271 partitions, and using a regular predicate, each partition needs to be accessed. However, if the partition predicate only accesses a single partition, this can lead to a big performance gain.

For the partition predicate to work correctly, you need to know which partition your data belongs to so that you can send the request to the correct partition. One of the ways of doing it is to make use of the PartitionAware interface when data is inserted, thereby controlling the owning partition. See the PartitionAware section for more information and examples.

A concrete example may be a web shop that sells phones and accessories. To find all the accessories of a phone, a query could be executed that selects all accessories for that phone. This query is executed on all members in the cluster and therefore could generate quite a lot of load. However, if we would store the accessories in the same partition as the phone, the partition predicate could use the partitionKey of the phone to select the right partition and then it queries for the accessories for that phone; and this reduces the load on the system and get faster query results.

Indexing Queries

Hazelcast distributed queries run on each member in parallel and return only the results to the caller. Then, on the caller side, the results are merged.

When a query runs on a member, Hazelcast iterates through all the owned entries and finds the matching ones. This can be made faster by indexing the most-queried fields, just like you would do for your database. Indexing adds overhead for each write operation but reading will be a lot faster. If you query your map a lot, make sure to add indexes for the most frequently queried fields. For example, if you do active AND age < 30 query, make sure you add an index for the active and age fields. The following example code does that by getting the map from the Hazelcast instance and adding indexes to the map with the IMap addIndex method.

IMap map = hazelcastInstance.getMap( "employees" );
// ordered, since we have ranged queries for this field
map.addIndex(new IndexConfig(IndexType.SORTED, "age"));
// not ordered, because boolean field cannot have range
map.addIndex(new IndexConfig(IndexType.HASH, "active"));

Note that creating indexes once is sufficient. Subsequent write operations on the map are reflected in the index automatically. So, although it is safe to call the addIndex() method repeatedly, there will be a performance penalty due to the redundant index creation.

When you call, for example, map.addIndex("fieldName", true), each partition iterates over its records and adds each entry to the index. The previously created index entry will be recreated and replaced with the new entry. The performance penalty will be proportional to the number of entries. If you have maps with a large number of entries, then synchronizing index addition process is recommended.

Other than using the addIndex() method, you can define your index declaratively or programmatically as described in the Configuring IMap Indexes section.

Indexing Ranged Queries

IMap.addIndex(IndexConfig) is used for adding index. For each indexed field, if you have ranged queries such as age>30, age BETWEEN 40 AND 60, then use IndexType.SORTED index Otherwise, use IndexType.HASH.

Configuring IMap Indexes

Also, you can define IMap indexes in configuration. An example is shown below.

  • XML

  • YAML

  • Spring

<hazelcast>
    ...
    <map name="default">
        <indexes>
            <index type="HASH">
                <attributes>
                    <attribute>name</attribute>
                </attributes>
            </index>
            <index>
                <attributes>
                    <attribute>age</attribute>
                </attributes>
            </index>
        </indexes>
    </map>
    ...
</hazelcast>
hazelcast:
  map:
    default:
      indexes:
        - type: HASH
            attributes:
              - "name"
        - attributes:
            - "age"
<hz:map name="default">
    <hz:indexes>
        <hz:index type="HASH">
            <hz:attributes>
                <hz:attribute>name</hz:attribute>
            </hz:attributes>
        </hz:index>
        <hz:index>
            <hz:attributes>
                <hz:attribute>age</hz:attribute>
            </hz:attributes>
        </hz:index>
    </hz:indexes>
</hz:map>

You can also define IMap indexes using programmatic configuration, as in the example below.

mapConfig.addIndexConfig(new IndexConfig(IndexType.HASH, "name"));
mapConfig.addIndexConfig(new IndexConfig(IndexType.SORTED, "age"));

The following is the Spring declarative configuration for the same example.

Non-primitive types to be indexed should implement Comparable.
If you configure the data structure to use High-Density Memory Store and indexes, the indexes are automatically stored in the High-Density Memory Store as well. This prevents from running into full garbage collections when doing a lot of updates to index.

Global and Partitioned Indexes

The on-heap indexes are always global, i.e., one index covers all IMaps entries stored on the partitions owned by a cluster member. Such indexes are beneficial for lookup and range queries because only one lookup operation is needed to execute a query. A drawback of global indexes is a potentially high contention on the index concurrent data structure that might cause performance degradation.

High-Density Memory Store supports partitioned indexes. Each partition owned by a cluster member has its own index. All operations on the partitioned index are performed on the partitioned thread, thus eliminating the contention issue of the global indexes. However, lookup and range queries have to perform lookup operations on every partition and combine the results. Normally, these partition and combine executions yield poorer performance results compared to the global indexes.

Global concurrent indexes (based on our own off-heap B+ Tree implementation) bring all the benefits of global indexes to IMap backed by High-Density Memory Store.

The global High-Density Memory Store indexes are enabled by default and controlled by the hazelcast.hd.global.index.enabled property. You can disable these indexes by setting this property to false.

Composite Indexes

Composite indexes, also known as compound indexes, are special kind of indexes that are built on top of the multiple map entry attributes and therefore may be used to significantly speed up the queries involving those attributes simultaneously.

There are two distinct composite index types used for two different purposes: unordered composite indexes and ordered ones.

Unordered Composite Indexes

The unordered indexes are used to perform equality queries, also known as the point queries, e.g., name = 'Alice'. These are specifically optimized for equality queries and don’t support other comparison operators like > or <=.

Additionally, the composite unordered indexes allow speeding up the equality queries involving multiple attributes simultaneously, e.g., name = 'Alice' and age = 33. This example query results in a single composite index lookup operation which can be performed very efficiently.

The unordered composite index on the name and age attributes may be configured for a map as follows:

  • XML

  • YAML

<hazelcast>
    ...
    <map name="persons">
        <indexes>
            <index type="HASH">
                <attributes>
                    <attribute>name</attribute>
                    <attribute>age</attribute>
                </attributes>
            </index>
        </indexes>
    </map>
    ...
</hazelcast>
hazelcast:
  map:
    default:
      - type: HASH
          attributes:
            - "name"
            - "age"

The attributes indexed by the unordered composite indexes can’t be matched partially: the name = 'Alice' query can’t utilize the composite index configured above.

Ordered Composite Indexes

The ordered indexes are specifically designed to perform efficient order comparison queries, also known as the range queries, e.g., age > 33. The equality queries, like age = 33, are still supported by the ordered indexes, but they are handled in a slightly less efficient manner comparing to the unordered indexes.

The composite ordered indexes extend the concept by allowing multiple equality predicates and a single order comparison predicate to be combined into a single index query operation. For instance, the name = 'Alice' and age > 33 and name = 'Bob' and age = 33 and balance > 0.0 queries are good candidates to be covered by an ordered composite index configured as follows:

  • XML

  • YAML

<hazelcast>
    ...
    <map name="persons">
        <indexes>
            <index>
                <attributes>
                    <attribute>name</attribute>
                    <attribute>age</attribute>
                    <attribute>balance</attribute>
                </attributes>
            </index>
        </indexes>
    </map>
    ...
</hazelcast>
hazelcast:
  map:
    persons:
      indexes:
        - attributes:
          - "name"
          - "age"
          - "balance"

Unlike the unordered composite indexes, partial attribute prefixes may be matched for the ordered composite indexes. In general, a valid non-empty attribute prefix is formed as a sequence of zero or more equality predicates followed by a zero or exactly one order comparison predicate. Given the index definition above, the following queries may be served by the index: name = 'Alice', name > 'Alice', name = 'Alice' and age > 33, name = 'Alice' and age = 33 and balance = 5.0. The following queries can’t be served the index: age = 33, age > 33 and balance = 0.0, balance > 0.0.

While matching the ordered composite indexes, multiple order comparison predicates acting on the same attribute are treated as a single range predicate acting on that attribute. Given the index definition above, the following queries may be served by the index: name > 'Alice' and name < 'Bob', name = 'Alice' and age > 33 and age < 55, name = 'Alice' and age = 33 and balance > 0.0 and balance < 100.0.

Composite Index Matching and Selection

The order of attributes involved in a query plays no role in the selection of the matching composite index: name = 'Alice' and age = 33 and age = 33 and name = 'Alice' queries are equivalent from the point of view of the index matching procedure.

The attributes involved in a query can be matched partially by the composite index matcher: name = 'Alice' and age = 33 and balance > 0.0 can be partially matched by the name, age composite index, the name = 'Alice' and age = 33 predicates are served by the matched index, while the balance > 0.0 predicate is processed by other means.

Bitmap Indexes

Bitmap indexes provide capabilities similar to unordered/hash indexes. The same set of predicates is supported:

  • equal

  • notEqual

  • in,

  • and

  • or

  • not

But, unlike hash indexes, bitmap indexes are able to achieve a much higher memory efficiency for low cardinality attributes at the cost of reduced query performance. In practice, the query performance is comparable to the performance of hash indexes, while memory footprint reduction is high, usually around an order of magnitude.

Bitmap indexes are specifically designed for indexing of collection and array attributes since a single IMap entry produces many index entries in that case. A single hash index entry costs a few tens of bytes, while a single bitmap index entry usually costs just a few bytes.

It’s also possible to improve the memory footprint while indexing regular single-value attributes, but the improvement is usually minor, depending on the data layout and total number of indexes.

Currently, bitmap indexes are not supported by off-heap High-Density Memory Stores (HD).

Configuring Bitmap Indexes

In the simplest form, bitmap index for an IMap entry attribute can be declaratively configured as follows:

  • XML

  • YAML

<hazelcast>
    ...
    <map name="persons">
        <indexes>
            <index type="BITMAP">
                <attributes>
                    <attribute>age</attribute>
                </attributes>
            </index>
        </indexes>
    </map>
    ...
</hazelcast>
hazelcast:
  map:
    persons:
      indexes:
        - type: BITMAP
          attributes:
            - "age"

Internally, a unique non-negative long ID is assigned to every indexed IMap entry based on the entry key. That unique ID is required for bitmap indexes to distinguish one indexed IMap entry from another.

The mapping between IMap entries and long IDs is not free and its performance and memory footprint can be improved in certain cases. For instance, if IMap entries already have a unique integer-valued attribute, the attribute values can be used as unique long IDs directly without any additional transformations. That can be configured as follows:

  • XML

  • YAML

<index type="BITMAP">
    <attributes>
        <attribute>age</attribute>
    </attributes>
    <bitmap-index-options>
        <unique-key>uniqueId</unique-key>
        <unique-key-transformation>RAW</unique-key-transformation>
    </bitmap-index-options>
</index>
      indexes:
        - type: BITMAP
          attributes:
            - "age"
          bitmap-index-options:
            unique-key: uniqueId
            unique-key-transformation: RAW

The index definition above instructs Hazelcast to create a bitmap index on the age attribute, extract the unique key values from uniqueId attribute and use the raw (RAW) extracted values directly as long IDs. If the extracted unique key value is not of long type, the widening conversion is performed for the following types: byte, short and int; boxed variants are also supported.

In certain cases, the extracted raw IDs might be randomly distributed. This causes increased memory usage in bitmap indexes since the best case scenario for them is sequential contiguous IDs. That can be countered by applying the renumbering technique:

  • XML

  • YAML

<index type="BITMAP">
    <attributes>
        <attribute>age</attribute>
    </attributes>
    <bitmap-index-options>
        <unique-key>uniqueId</unique-key>
        <unique-key-transformation>LONG</unique-key-transformation>
    </bitmap-index-options>
</index>
      indexes:
        - type: BITMAP
          attributes:
            - "age"
          bitmap-index-options:
            unique-key: uniqueId
            unique-key-transformation: LONG

The index definition above instructs the bitmap index to extract the unique keys from uniqueId attribute, convert every extracted non-negative value to long (LONG) and assign an internal sequential unique long ID based on that extracted and then converted unique value. The widening conversion is applied to the extracted values, if necessary.

This long-to-long mapping is performed more efficiently than the general object-to-long mapping done for the simple index definitions. Basically, the following simple bitmap index definition:

  • XML

  • YAML

<index type="BITMAP">
    <attributes>
        <attribute>age</attribute>
    </attributes>
</index>
      indexes:
        - type: BITMAP
          attributes:
            - "age"

is equivalent to the following full-form definition:

  • XML

  • YAML

<index type="BITMAP">
    <attributes>
        <attribute>age</attribute>
    </attributes>
    <bitmap-index-options>
        <unique-key>__key</unique-key>
        <unique-key-transformation>OBJECT</unique-key-transformation>
    </bitmap-index-options>
</index>
      indexes:
        - type: BITMAP
          attributes:
            - "age"
          bitmap-index-options:
            unique-key: __key
            unique-key-transformation: OBJECT

Which indexes age attribute, uses IMap entry keys (__key) interpreted as Java objects (OBJECT) to assign internal unique long IDs.

The full-form definition syntax is defined as follows:

  • XML

  • YAML

<index type="BITMAP">
    <attributes>
        <attribute><attr></attribute>
    </attributes>
    <bitmap-index-options>
        <unique-key><key></unique-key>
        <unique-key-transformation><transformation></unique-key-transformation>
    </bitmap-index-options>
</index>
      indexes:
        - type: BITMAP
          attributes:
            - <attribute>
          bitmap-index-options:
            unique-key: <key>
            unique-key-transformation: <transformation>

The following are the parameter descriptions:

  • <attr>: Specifies the attribute index.

  • <key>: Specifies the attribute to use as a unique key source for internal unique long ID assignment.

  • <transformation>: Specifies the transformation to be applied to unique keys to generate unique long IDs from them. The following transformations are supported:

    • OBJECT: Object-to-long transformation. Each extracted unique key value is interpreted as a Java object instance. Internally, an object-to-long hash table is used to establish the mapping from unique keys to unique IDs. Good as a general-purpose transformation.

    • LONG: Long-to-long transformation. Each extracted unique key value is interpreted as a non-negative long value, the widening conversion from byte, short and int is performed, if necessary. Internally, a long-to-long hash table is used to establish the mapping from unique keys to unique IDs, which is more efficient than the object-to-long hash table. It is good for sparse/random unique integer-valued keys renumbering to raise the IDs density and to make the bitmap index more memory-efficient as a result.

    • RAW: Raw transformation. Each extracted unique key value is interpreted as a non-negative long value, the widening conversion from byte, short and int is performed, if necessary. Internally, no hash table of any kind is used to establish the mapping from unique keys to unique IDs, the raw extracted keys are used directly as IDs. It is good for dense unique integer-valued keys, and it has the best performance in terms of time and memory.

The regular dotted attribute path syntax is supported for <attr> and <key>:

  • XML

  • YAML

<index type="BITMAP">
    <attributes>
        <attribute>name.first</attribute>
    </attributes>
</index>
<index type="BITMAP">
    <attributes>
        <attribute>name.first</attribute>
    </attributes>
    <bitmap-index-options>
        <unique-key>__key.id</unique-key>
    </bitmap-index-options>
</index>
<index type="BITMAP">
    <attributes>
        <attribute>name.first</attribute>
    </attributes>
    <bitmap-index-options>
        <unique-key>id.external</unique-key>
    </bitmap-index-options>
</index>
      indexes:
        - type: BITMAP
          attributes:
            - name.first
        - type: BITMAP
          attributes:
            - name.first
          bitmap-index-options:
            unique-key: __key.id
        - type: BITMAP
          attributes:
            - name.first
          bitmap-index-options:
            unique-key: id.external

Collection and array indexing is also possible using the regular syntax:

  • XML

  • YAML

<index type="BITMAP">
    <attributes>
        <attribute>habits[any]</attribute>
    </attributes>
</index>
<index type="BITMAP">
    <attributes>
        <attribute>habits[0]</attribute>
    </attributes>
</index>
      indexes:
        - type: BITMAP
          attributes:
            - habits[any]
        - type: BITMAP
          attributes:
            - habits[0]

Bitmap Index Querying

Bitmap index matching and selection for queries are performed automatically. No special treatment is required. The querying can be performed using the regular IMap querying methods: IMap.values(Predicate), IMap.entrySet(Predicate), etc.

Copying Indexes

The underlying data structures used by the indexes need to copy the query results to make sure that the results are correct. This copying process is performed either when reading the index from the data structure (on-read) or writing to it (on-write).

On-read copying means that, for each index-read operation, the result of the query is copied before it is sent to the caller. Depending on the query result’s size, this type of index copying may be slower since the result is stored in a map, i.e., all entries need to have the hash calculated before being stored. Unlike the index-read operations, each index-write operation is fast, since there is no copying. So, this option can be preferred in index-write intensive cases.

On-write copying means that each index-write operation completely copies the underlying map to provide the copy-on-write semantics and this may be a slow operation depending on the index size. Unlike index-write operations, each index-read operation is fast since the operation only includes accessing the map that stores the results and returning them to the caller.

Another option is never copying the results of a query to a separate map. This means the results backed by the underlying index-map can change after the query has been executed (such as an entry might have been added or removed from an index, or it might have been remapped). This option can be preferred if you expect "mostly correct" results, i.e., if it is not a problem when some entries returned in the query result set do not match the initial query criteria. This is the fastest option since there is no copying.

You can set one of these options using the system property hazelcast.index.copy.behavior. The following values, which are explained in the above paragraphs, can be set:

  • COPY_ON_READ (the default value)

  • COPY_ON_WRITE

  • NEVER

The following is an example configuration snippet:

  • XML

  • YAML

<hazelcast>
    <cluster-name>dev</cluster-name>
    ...
    <properties>
        <property name="hazelcast.index.copy.behavior">NEVER</property>
    </properties>
    ...
</hazelcast>
hazelcast:
  cluster-name: dev
  ...
  properties:
    hazelcast.index.copy.behavior: NEVER
  ...

See also the Configuring with System Properties section for reference.

Usage of this system property is supported for BINARY and OBJECT in-memory formats. Only in Hazelcast 3.8.7, it is also supported for NATIVE in-memory format.

Indexing Attributes with ValueExtractor

You can also define custom attributes that may be referenced in predicates, queries and indexes. Custom attributes can be defined by implementing a ValueExtractor. See the Custom Attributes section for details.

Using "this" as an Attribute

You can use the keyword this as an attribute name while adding an index or creating a predicate. A basic usage is shown below.

map.addIndex(new IndexConfig(IndexType.SORTED, "this"));
Predicate<Integer, Integer> lessEqual = Predicates.between("this", 12, 20);

Another basic example using SQL predicate is shown below.

Predicates.sql("this = 'jones'")
Predicates.sql("this.age > 33")

The special attribute this acts on the value of a map entry. Typically, you do not need to specify it while accessing a property of an entry’s value, since its presence is implicitly assumed if the special attribute __key is not specified.

Configuring Query Thread Pool

You can change the size of thread pool dedicated to query operations using the pool-size property. Each query consumes a single thread from a Generic Operations ThreadPool on each Hazelcast member - let’s call it the query-orchestrating thread. That thread is blocked throughout the whole execution-span of a query on the member.

The query-orchestrating thread uses the threads from the query-thread pool in the following cases:

  • if you run a PagingPredicate (since each page runs as a separate task)

  • if you set the system property hazelcast.query.predicate.parallel.evaluation to true (since the predicates are evaluated in parallel)

See the Filtering with Paging Predicates section and System Properties appendix for information on paging predicates and for description of the above system property.

Below is an example of that declarative configuration.

  • XML

  • YAML

<hazelcast>
    ...
    <executor-service name="hz:query">
        <pool-size>100</pool-size>
    </executor-service>
    ...
</hazelcast>
hazelcast:
  ...
  executor-service:
    "hz:query":
      pool-size: 100

Below is the equivalent programmatic configuration.

Config cfg = new Config();
cfg.getExecutorConfig("hz:query").setPoolSize(100);

Query Requests from Clients

When dealing with the query requests coming from the clients to your members, Hazelcast offers the following system properties to tune your thread pools:

  • hazelcast.clientengine.thread.count which is the number of threads to process non-partition-aware client requests, like map.size() and executor tasks. Its default value is the number of cores multiplied by 20.

  • hazelcast.clientengine.query.thread.count which is the number of threads to process query requests coming from the clients. Its default value is the number of cores.

If there are a lot of query request from the clients, you may want to increase the value of hazelcast.clientengine.query.thread.count. In addition to this tuning, you may also consider increasing the value of hazelcast.clientengine.thread.count if the CPU load in your system is not high and there is plenty of free memory.