Querying Maps with the Predicate API

The Predicates API offers built-in methods for creating queries as well as an sql() method for SQL-like queries.

Assume that you have a map called employee that contains values of Employee objects:

public class Employee implements Serializable {
    private String name;
    private int age;
    private boolean active;
    private double salary;

    public Employee(String name, int age, boolean active, double salary) {
        this.name = name;
        this.age = age;
        this.active = active;
        this.salary = salary;
    }

    public Employee() {
    }

    public String getName() {
        return name;
    }

    public int getAge() {
        return age;
    }

    public double getSalary() {
        return salary;
    }

    public boolean isActive() {
        return active;
    }
}

Now let’s look for the employees who are active and have an age less than 30.

When using Portable objects, if one field of an object exists on one member but does not exist on another one, Hazelcast does not throw an unknown field exception. Instead, Hazelcast treats that predicate, which tries to perform a query on an unknown field, as an always false predicate.

IMap<String, Employee> map = hazelcastInstance.getMap( "employee" );

EntryObject e = Predicates.newPredicateBuilder().getEntryObject();
Predicate predicate = e.is( "active" ).and( e.get( "age" ).lessThan( 30 ) );

Collection<Employee> employees = map.values( predicate );

In the above example code, predicate verifies whether the entry is active and its age value is less than 30. This predicate is applied to the employee map using the map.values(predicate) method. This method sends the predicate to all cluster members and merges the results coming from them. Since the predicate is communicated between the members, it needs to be serializable.

Predicates can also be applied to keySet(), entrySet() and localKeySet() methods of distributed maps.

Predicates Class Operators

The Predicates class includes many operators for your query requirements. The following are descriptions for some of them:

equal: Checks if the result of an expression is equal to a given value.
notEqual: Checks if the result of an expression is not equal to a given value.
instanceOf: Checks if the result of an expression has a certain type.
like: Checks if the result of an expression matches some string pattern. % (percentage sign) is the placeholder for many characters, (underscore) is placeholder for only one character.
ilike: A case-insensitive variant of like.
greaterThan: Checks if the result of an expression is greater than a certain value.
greaterEqual: Checks if the result of an expression is greater than or equal to a certain value.
lessThan: Checks if the result of an expression is less than a certain value.
lessEqual: Checks if the result of an expression is less than or equal to a certain value.
between: Checks if the result of an expression is between two values (this is inclusive).
in: Checks if the result of an expression is an element of a certain collection.
isNot: Checks if the result of an expression is false.
regex: Checks if the result of an expression matches some regular expression.
alwaysTrue: The result of an expression always matches.
alwaysFalse: The result of an expression ever matches.

See the Predicates Javadoc for all predicates provided.

Combining Predicates with AND, OR, NOT

You can combine predicates using the and, or and not operators, as shown in the below examples.

public Collection<Employee> getWithNameAndAge( String name, int age ) {
    Predicate namePredicate = Predicates.equal( "name", name );
    Predicate agePredicate = Predicates.equal( "age", age );
    Predicate predicate = Predicates.and( namePredicate, agePredicate );
    return employeeMap.values( predicate );
}

public Collection<Employee> getWithNameOrAge( String name, int age ) {
    Predicate namePredicate = Predicates.equal( "name", name );
    Predicate agePredicate = Predicates.equal( "age", age );
    Predicate predicate = Predicates.or( namePredicate, agePredicate );
    return employeeMap.values( predicate );
}

public Collection<Employee> getNotWithName( String name ) {
    Predicate namePredicate = Predicates.equal( "name", name );
    Predicate predicate = Predicates.not( namePredicate );
    return employeeMap.values( predicate );
}

Simplifying with PredicateBuilder

You can simplify predicate usage with the PredicateBuilder interface, which offers simpler predicate building. See the below example code which selects all people with a certain name and age.

public Collection<Employee> getWithNameAndAgeSimplified( String name, int age ) {
    EntryObject e = Predicates.newPredicateBuilder().getEntryObject();
    Predicate agePredicate = e.get( "age" ).equal( age );
    Predicate predicate = e.get( "name" ).equal( name ).and( agePredicate );
    return employeeMap.values( predicate );
}

Querying with SQL-like Predicates

Predicates.sql() takes the regular SQL where clause. Here is an example:

IMap<String, Employee> map = hazelcastInstance.getMap( "employee" );
Set<Employee> employees = map.values( Predicates.sql( "active AND age < 30" ) );

Hazelcast offers an SQL service that allows you to execute SQL queries, as opposed to SQL-like predicates in case of Predicates.sql(). See SQL Overview for more information.

Supported SQL Syntax

AND/OR: `<expression> AND <expression> AND <expression>… `

active AND age>30
active=false OR age = 45 OR name = 'Joe'
active AND ( age > 20 OR salary < 60000 )

Equality: =, !=, <, ⇐, >, >=

<expression> = value
age ⇐ 30
name = 'Joe'
salary != 50000

BETWEEN: <attribute> [NOT] BETWEEN <value1> AND <value2>

age BETWEEN 20 AND 33 ( same as age >= 20 AND age ⇐ 33 )
age NOT BETWEEN 30 AND 40 ( same as age < 30 OR age > 40 )

IN: <attribute> [NOT] IN (val1, val2,…)

age IN ( 20, 30, 40 )
age NOT IN ( 60, 70 )
active AND ( salary >= 50000 OR ( age NOT BETWEEN 20 AND 30 ) )
age IN ( 20, 30, 40 ) AND salary BETWEEN ( 50000, 80000 )

LIKE: <attribute> [NOT] LIKE "expression"

The % (percentage sign) is placeholder for multiple characters, an _ (underscore) is placeholder for only one character.

name LIKE 'Jo%' (true for 'Joe', 'Josh', 'Joseph' etc.)
name LIKE 'Jo_' (true for 'Joe'; false for 'Josh')
name NOT LIKE 'Jo_' (true for 'Josh'; false for 'Joe')
name LIKE 'J_s%' (true for 'Josh', 'Joseph'; false 'John', 'Joe')

ILIKE: <attribute> [NOT] ILIKE 'expression'

Similar to LIKE predicate but in a case-insensitive manner.

name ILIKE 'Jo%' (true for 'Joe', 'joe', 'jOe','Josh','joSH', etc.)
name ILIKE 'Jo_' (true for 'Joe' or 'jOE'; false for 'Josh')

REGEX: <attribute> [NOT] REGEX 'expression'

name REGEX 'abc-.*' (true for 'abc-123'; false for 'abx-123')

You can escape the % and _ placeholder characters in your SQL queries with predicates using the backslash (\) character. The apostrophe (') can be escaped with another apostrophe, i.e., ''. If you use REGEX, you need to escape characters according to the normal Java escape syntax; see here for the details.

Querying Entry Keys with Predicates

You can use __key attribute to perform a predicated search for entry keys. See the following example:

IMap<String, Person> personMap = hazelcastInstance.getMap(persons);
personMap.put("Alice", new Person("Alice", 35, Gender.FEMALE));
personMap.put("Andy",  new Person("Andy",  37, Gender.MALE));
personMap.put("Bob",   new Person("Bob",   22, Gender.MALE));
[...]
Predicate predicate = Predicates.sql("__key like A%");
Collection<Person> startingWithA = personMap.values(predicate);

In this example, the code creates a collection with the entries whose keys start with the letter "A”.

Querying JSON Strings

You can query JSON strings stored inside your Hazelcast clusters. To query a JSON string, you first need to create a HazelcastJsonValue from the JSON string. You can use HazelcastJsonValue objects both as keys and values in distributed data structures. Then, it is possible to query these objects using the Hazelcast query methods explained in this section.

String person1 = "{ \"name\": \"John\", \"age\": 35 }";
String person2 = "{ \"name\": \"Jane\", \"age\": 24 }";
String person3 = "{ \"name\": \"Trey\", \"age\": 17 }";

IMap<Integer, HazelcastJsonValue> idPersonMap = instance.getMap("jsonValues");

idPersonMap.put(1, new HazelcastJsonValue(person1));
idPersonMap.put(2, new HazelcastJsonValue(person2));
idPersonMap.put(3, new HazelcastJsonValue(person3));

Collection<HazelcastJsonValue> peopleUnder21 = idPersonMap.values(Predicates.lessThan("age", 21));

When running the queries, Hazelcast treats values extracted from the JSON documents as Java types so they can be compared with the query attribute. JSON specification defines five primitive types to be used in the JSON documents: number,string, true, false and null. The string, true/false and null types are treated as String, boolean and null, respectively. We treat the extracted number values as long where possible. Otherwise, number types are treated as double.

It is possible to query nested attributes and arrays in JSON documents, using the Predicates API. The query syntax is the same as querying other Hazelcast objects as explained in the Querying in Collections and Arrays section.

/**
 * Sample JSON object
 *
 * {
 *     "departmentId": 1,
 *     "room": "alpha",
 *     "people": [
 *         {
 *             "name": "Peter",
 *             "age": 26,
 *             "salary": 50000
 *         },
 *         {
 *             "name": "Jonah",
 *             "age": 50,
 *             "salary": 140000
 *         }
 *     ]
 * }
 *
 *
 * The following query finds all the departments that have a person named "Peter" working in them.
 */
Collection<HazelcastJsonValue> departmentWithPeter = departments.values(Predicates.equal("people[any].name", "Peter"));

HazelcastJsonValue is a lightweight wrapper around your JSON strings. It is used merely as a way to indicate that the contained string should be treated as a valid JSON value. Hazelcast does not check the validity of JSON strings that are added to maps. Adding an invalid JSON string in a map is permissible. However, in that case whether such an entry is going to be returned or not from a query is not defined.

Metadata Creation for JSON Querying

By default, for each HazelcastJsonValue object stored in a map, Hazelcast also stores a metadata object, which helps you to query these objects faster.

This metadata object is created every time a HazelcastJsonValue is put into a map and stored in the on-heap or off-heap memory, depending on the map’s in-memory format setting.

Depending on your application’s needs, you may want to turn off metadata creation to both decrease latency when adding objects to a map and to increase throughput. You can configure this setting using a metadata policy.

Filtering with Paging Predicates

Hazelcast provides paging for defined predicates. With its PagingPredicate interface, you can get a collection of keys, values, or entries page by page by filtering them with predicates and giving the size of the pages. Also, you can sort the entries by specifying comparators. In this case, the comparator should be Serializable and the serialization factory implementations you use, e.g., PortableFactory and DataSerializableFactory, should be registered. See the Serialization chapter on how to register these factories.

Paging predicates require the objects to be deserialized both on the calling side (either a member or client) and the member side from which the collection is retrieved. Therefore, you need to register the serialization factories you use on all the members and clients on which the paging predicates are used. See the Serialization chapter on how to register these factories.

In the example code below:

The greaterEqual predicate gets values from the "students" map. This predicate has a filter to retrieve the objects with an "age" greater than or equal to 18.
Then a PagingPredicate is constructed in which the page size is 5, so that there are five objects in each page. The first time the values() method is called, the first page is fetched.
Finally, the subsequent page is fetched by calling the nextPage() method of PagingPredicate and querying the map again with the updated PagingPredicate.

IMap<Integer, Student> map = hazelcastInstance.getMap( "students" );
Predicate greaterEqual = Predicates.greaterEqual( "age", 18 );
PagingPredicate pagingPredicate = Predicates.pagingPredicate( greaterEqual, 5 );
// Retrieve the first page
Collection<Student> values = map.values( pagingPredicate );
...
// Set up next page
pagingPredicate.nextPage();
// Retrieve next page
values = map.values( pagingPredicate );
...

If a comparator is not specified for PagingPredicate, but you want to get a collection of keys or values page by page, keys or values must be instances of Comparable (i.e., they must implement java.lang.Comparable). Otherwise, the java.lang.IllegalArgument exception is thrown.

You can also access a specific page more easily with the help of the setPage() method. This way, if you make a query for the hundredth page, for example, it gets all 100 pages at once instead of reaching the hundredth page one by one using the nextPage() method. Note that this feature tires the memory and see the PagingPredicate Javadoc.

Paging Predicate, also known as Order & Limit, is not supported in Transactional Context.

Filtering with Partition Predicate

You can run queries on a single partition in your cluster using the partition predicate (PartitionPredicate).

The Predicates.partitionPredicate() method takes a predicate and partition key as parameters, gets the partition ID using the key and runs that predicate only on the partition where that key belongs.

See the following code snippet:

...
Predicate predicate = Predicates.partitionPredicate(partitionKey, Predicates.alwaysTrue());

Collection<Integer> values = map.values(predicate);
Collection<String> keys = map.keySet(predicate);
...

By default there are 271 partitions, and using a regular predicate, each partition needs to be accessed. However, if the partition predicate only accesses a single partition, this can lead to a big performance gain.

For the partition predicate to work correctly, you need to know which partition your data belongs to so that you can send the request to the correct partition. One of the ways of doing it is to make use of the PartitionAware interface when data is inserted, thereby controlling the owning partition. See the PartitionAware section for more information and examples.

A concrete example may be a web shop that sells phones and accessories. To find all the accessories of a phone, a query could be executed that selects all accessories for that phone. This query is executed on all members in the cluster and therefore could generate quite a lot of load. However, if we would store the accessories in the same partition as the phone, the partition predicate could use the partitionKey of the phone to select the right partition and then it queries for the accessories for that phone; and this reduces the load on the system and get faster query results.

Configuring the Query Thread Pool

You can change the size of thread pool dedicated to query operations using the pool-size property. Each query consumes a single thread from a Generic Operations ThreadPool on each Hazelcast member - let’s call it the query-orchestrating thread. That thread is blocked throughout the whole execution-span of a query on the member.

The query-orchestrating thread uses the threads from the query-thread pool in the following cases:

if you run a PagingPredicate (since each page runs as a separate task)
if you set the system property hazelcast.query.predicate.parallel.evaluation to true (since the predicates are evaluated in parallel)

See the Filtering with Paging Predicates section and System Properties appendix for information about paging predicates and for description of the above system property.

Below is an example of that declarative configuration.

XML
YAML

<hazelcast>
    ...
    <executor-service name="hz:query">
        <pool-size>100</pool-size>
    </executor-service>
    ...
</hazelcast>

hazelcast:
  ...
  executor-service:
    "hz:query":
      pool-size: 100

Below is the equivalent programmatic configuration.

Config cfg = new Config();
cfg.getExecutorConfig("hz:query").setPoolSize(100);

Query Requests from Clients

When dealing with the query requests coming from the clients to your members, Hazelcast offers the following system properties to tune your thread pools:

hazelcast.clientengine.thread.count which is the number of threads to process non-partition-aware client requests, like map.size() and executor tasks. Its default value is the number of cores multiplied by 20.
hazelcast.clientengine.query.thread.count which is the number of threads to process query requests coming from the clients. Its default value is the number of cores.

If there are a lot of query request from the clients, you may want to increase the value of hazelcast.clientengine.query.thread.count. In addition to this tuning, you may also consider increasing the value of hazelcast.clientengine.thread.count if the CPU load in your system is not high and there is plenty of free memory.

Next Steps

Keep an up-to-date record of query results, using a continuous query cache.

Create more complex queries with custom attributes.