Querying Maps with the Predicate API
The Predicates API offers built-in methods for creating queries as well as an sql()
method for SQL-like queries.
Assume that you have a map called employee
that contains values of
Employee
objects:
public class Employee implements Serializable {
private String name;
private int age;
private boolean active;
private double salary;
public Employee(String name, int age, boolean active, double salary) {
this.name = name;
this.age = age;
this.active = active;
this.salary = salary;
}
public Employee() {
}
public String getName() {
return name;
}
public int getAge() {
return age;
}
public double getSalary() {
return salary;
}
public boolean isActive() {
return active;
}
}
Now let’s look for the employees who are active and have an age less than 30.
When using Portable objects, if one field of an object exists on one member but does not exist on another one, Hazelcast does not throw an unknown field exception. Instead, Hazelcast treats that predicate, which tries to perform a query on an unknown field, as an always false predicate. |
IMap<String, Employee> map = hazelcastInstance.getMap( "employee" );
EntryObject e = Predicates.newPredicateBuilder().getEntryObject();
Predicate predicate = e.is( "active" ).and( e.get( "age" ).lessThan( 30 ) );
Collection<Employee> employees = map.values( predicate );
In the above example code, predicate
verifies whether the entry is
active and its age
value is less than 30. This predicate
is
applied to the employee
map using the map.values(predicate)
method.
This method sends the predicate to all cluster members
and merges the results coming from them. Since the predicate is
communicated between the members, it needs to
be serializable.
Predicates can also be applied to keySet() , entrySet() and
localKeySet() methods of distributed maps.
|
Predicates Class Operators
The Predicates
class includes many operators for your query requirements.
The following are descriptions for some of them:
-
equal
: Checks if the result of an expression is equal to a given value. -
notEqual
: Checks if the result of an expression is not equal to a given value. -
instanceOf
: Checks if the result of an expression has a certain type. -
like
: Checks if the result of an expression matches some string pattern. % (percentage sign) is the placeholder for many characters, (underscore) is placeholder for only one character. -
ilike
: A case-insensitive variant oflike
. -
greaterThan
: Checks if the result of an expression is greater than a certain value. -
greaterEqual
: Checks if the result of an expression is greater than or equal to a certain value. -
lessThan
: Checks if the result of an expression is less than a certain value. -
lessEqual
: Checks if the result of an expression is less than or equal to a certain value. -
between
: Checks if the result of an expression is between two values (this is inclusive). -
in
: Checks if the result of an expression is an element of a certain collection. -
isNot
: Checks if the result of an expression is false. -
regex
: Checks if the result of an expression matches some regular expression. -
alwaysTrue
: The result of an expression always matches. -
alwaysFalse
: The result of an expression ever matches.
See the Predicates Javadoc for all predicates provided. |
Combining Predicates with AND, OR, NOT
You can combine predicates using the and
, or
and not
operators,
as shown in the below examples.
public Collection<Employee> getWithNameAndAge( String name, int age ) {
Predicate namePredicate = Predicates.equal( "name", name );
Predicate agePredicate = Predicates.equal( "age", age );
Predicate predicate = Predicates.and( namePredicate, agePredicate );
return employeeMap.values( predicate );
}
public Collection<Employee> getWithNameOrAge( String name, int age ) {
Predicate namePredicate = Predicates.equal( "name", name );
Predicate agePredicate = Predicates.equal( "age", age );
Predicate predicate = Predicates.or( namePredicate, agePredicate );
return employeeMap.values( predicate );
}
public Collection<Employee> getNotWithName( String name ) {
Predicate namePredicate = Predicates.equal( "name", name );
Predicate predicate = Predicates.not( namePredicate );
return employeeMap.values( predicate );
}
Simplifying with PredicateBuilder
You can simplify predicate usage with the PredicateBuilder
interface,
which offers simpler predicate building. See the
below example code which selects all people with a certain name and age.
public Collection<Employee> getWithNameAndAgeSimplified( String name, int age ) {
EntryObject e = Predicates.newPredicateBuilder().getEntryObject();
Predicate agePredicate = e.get( "age" ).equal( age );
Predicate predicate = e.get( "name" ).equal( name ).and( agePredicate );
return employeeMap.values( predicate );
}
Querying with SQL-like Predicates
Predicates.sql()
takes the regular SQL where
clause.
Here is an example:
IMap<String, Employee> map = hazelcastInstance.getMap( "employee" );
Set<Employee> employees = map.values( Predicates.sql( "active AND age < 30" ) );
Hazelcast offers an SQL service that allows you to execute SQL queries,
as opposed to SQL-like predicates in case of Predicates.sql() . See
SQL Overview for more information.
|
Supported SQL Syntax
AND/OR: `<expression> AND <expression> AND <expression>… `
-
active AND age>30
-
active=false OR age = 45 OR name = 'Joe'
-
active AND ( age > 20 OR salary < 60000 )
Equality: =, !=, <, ⇐, >, >=
-
<expression> = value
-
age ⇐ 30
-
name = 'Joe'
-
salary != 50000
BETWEEN: <attribute> [NOT] BETWEEN <value1> AND <value2>
-
age BETWEEN 20 AND 33 ( same as age >= 20 AND age ⇐ 33 )
-
age NOT BETWEEN 30 AND 40 ( same as age < 30 OR age > 40 )
IN: <attribute> [NOT] IN (val1, val2,…)
-
age IN ( 20, 30, 40 )
-
age NOT IN ( 60, 70 )
-
active AND ( salary >= 50000 OR ( age NOT BETWEEN 20 AND 30 ) )
-
age IN ( 20, 30, 40 ) AND salary BETWEEN ( 50000, 80000 )
LIKE: <attribute> [NOT] LIKE "expression"
The %
(percentage sign) is placeholder for multiple characters,
an _
(underscore) is placeholder for only one character.
-
name LIKE 'Jo%'
(true for 'Joe', 'Josh', 'Joseph' etc.) -
name LIKE 'Jo_'
(true for 'Joe'; false for 'Josh') -
name NOT LIKE 'Jo_'
(true for 'Josh'; false for 'Joe') -
name LIKE 'J_s%'
(true for 'Josh', 'Joseph'; false 'John', 'Joe')
ILIKE: <attribute> [NOT] ILIKE 'expression'
Similar to LIKE predicate but in a case-insensitive manner.
-
name ILIKE 'Jo%'
(true for 'Joe', 'joe', 'jOe','Josh','joSH', etc.) -
name ILIKE 'Jo_'
(true for 'Joe' or 'jOE'; false for 'Josh')
REGEX: <attribute> [NOT] REGEX 'expression'
-
name REGEX 'abc-.*'
(true for 'abc-123'; false for 'abx-123')
You can escape the % and _ placeholder characters in your
SQL queries with predicates using the
backslash (\ ) character. The apostrophe (' ) can be escaped with another
apostrophe, i.e., '' . If you use REGEX, you need to escape characters
according to the normal Java escape syntax; see here
for the details.
|
Querying Entry Keys with Predicates
You can use __key
attribute to perform a predicated search for entry
keys. See the following example:
IMap<String, Person> personMap = hazelcastInstance.getMap(persons);
personMap.put("Alice", new Person("Alice", 35, Gender.FEMALE));
personMap.put("Andy", new Person("Andy", 37, Gender.MALE));
personMap.put("Bob", new Person("Bob", 22, Gender.MALE));
[...]
Predicate predicate = Predicates.sql("__key like A%");
Collection<Person> startingWithA = personMap.values(predicate);
In this example, the code creates a collection with the entries whose keys start with the letter "A”.
Querying JSON Strings
You can query JSON strings stored inside your Hazelcast clusters. To
query a JSON string,
you first need to create a HazelcastJsonValue
from the JSON string.
You can use HazelcastJsonValue
objects both as keys and values in
distributed data structures. Then, it is
possible to query these objects using the Hazelcast query methods
explained in this section.
String person1 = "{ \"name\": \"John\", \"age\": 35 }";
String person2 = "{ \"name\": \"Jane\", \"age\": 24 }";
String person3 = "{ \"name\": \"Trey\", \"age\": 17 }";
IMap<Integer, HazelcastJsonValue> idPersonMap = instance.getMap("jsonValues");
idPersonMap.put(1, new HazelcastJsonValue(person1));
idPersonMap.put(2, new HazelcastJsonValue(person2));
idPersonMap.put(3, new HazelcastJsonValue(person3));
Collection<HazelcastJsonValue> peopleUnder21 = idPersonMap.values(Predicates.lessThan("age", 21));
When running the queries, Hazelcast treats values extracted from
the JSON documents as Java types so they
can be compared with the query attribute. JSON specification
defines five primitive types to be used in the JSON
documents: number
,string
, true
, false
and null
. The string
,
true/false
and null
types are treated
as String
, boolean
and null
, respectively. We treat the extracted
number
values as long
where possible. Otherwise, number
types are treated
as double
.
It is possible to query nested attributes and arrays in JSON documents, using the Predicates API. The query syntax is the same as querying other Hazelcast objects as explained in the Querying in Collections and Arrays section.
/**
* Sample JSON object
*
* {
* "departmentId": 1,
* "room": "alpha",
* "people": [
* {
* "name": "Peter",
* "age": 26,
* "salary": 50000
* },
* {
* "name": "Jonah",
* "age": 50,
* "salary": 140000
* }
* ]
* }
*
*
* The following query finds all the departments that have a person named "Peter" working in them.
*/
Collection<HazelcastJsonValue> departmentWithPeter = departments.values(Predicates.equal("people[any].name", "Peter"));
HazelcastJsonValue
is a lightweight wrapper around your JSON strings.
It is used merely as a way to indicate
that the contained string should be treated as a valid JSON value.
Hazelcast does not check the validity of JSON
strings that are added to maps. Adding an invalid JSON string in a map is
permissible. However, in that case
whether such an entry is going to be returned or not from a query is not defined.
Metadata Creation for JSON Querying
By default, for each HazelcastJsonValue
object stored in a map, Hazelcast also stores a metadata object, which helps you to query these objects faster.
This metadata object is created every time
a HazelcastJsonValue
is put into a map and stored in the on-heap or off-heap
memory, depending on the map’s in-memory format setting.
Depending on your application’s needs, you may want to turn off metadata creation to both decrease latency when adding objects to a map and to increase throughput. You can configure this setting using a metadata policy.
Filtering with Paging Predicates
Hazelcast provides paging for defined predicates. With its PagingPredicate
interface, you can
get a collection of keys, values, or entries page by page by filtering
them with predicates and giving the size of the pages. Also, you
can sort the entries by specifying comparators. In this case, the comparator
should be Serializable
and the serialization factory implementations you use,
e.g., PortableFactory
and DataSerializableFactory
, should be registered.
See the Serialization chapter on how to register these
factories.
Paging predicates require the objects to be deserialized both on the calling side (either a member or client) and the member side from which the collection is retrieved. Therefore, you need to register the serialization factories you use on all the members and clients on which the paging predicates are used. See the Serialization chapter on how to register these factories.
In the example code below:
-
The
greaterEqual
predicate gets values from the "students" map. This predicate has a filter to retrieve the objects with an "age" greater than or equal to 18. -
Then a
PagingPredicate
is constructed in which the page size is 5, so that there are five objects in each page. The first time thevalues()
method is called, the first page is fetched. -
Finally, the subsequent page is fetched by calling the
nextPage()
method ofPagingPredicate
and querying the map again with the updatedPagingPredicate
.
IMap<Integer, Student> map = hazelcastInstance.getMap( "students" );
Predicate greaterEqual = Predicates.greaterEqual( "age", 18 );
PagingPredicate pagingPredicate = Predicates.pagingPredicate( greaterEqual, 5 );
// Retrieve the first page
Collection<Student> values = map.values( pagingPredicate );
...
// Set up next page
pagingPredicate.nextPage();
// Retrieve next page
values = map.values( pagingPredicate );
...
If a comparator is not specified for PagingPredicate
, but you want
to get a collection of keys or values page by page, keys or values must
be instances of Comparable
(i.e., they must implement java.lang.Comparable
).
Otherwise, the java.lang.IllegalArgument
exception is thrown.
You can also access a specific page more
easily with the help of the setPage()
method. This way, if you make
a query for the hundredth page, for example, it gets all 100 pages at
once instead of reaching the hundredth page one by one using the nextPage()
method.
Note that this feature tires the memory and see the
PagingPredicate Javadoc.
Paging Predicate, also known as Order & Limit, is not supported in Transactional Context. |
Filtering with Partition Predicate
You can run queries on a single partition in your cluster using
the partition predicate (PartitionPredicate
).
The Predicates.partitionPredicate()
method takes a predicate and partition key
as parameters, gets the partition ID using the key and runs that predicate only
on the partition where that key belongs.
See the following code snippet:
...
Predicate predicate = Predicates.partitionPredicate(partitionKey, Predicates.alwaysTrue());
Collection<Integer> values = map.values(predicate);
Collection<String> keys = map.keySet(predicate);
...
By default there are 271 partitions, and using a regular predicate, each partition needs to be accessed. However, if the partition predicate only accesses a single partition, this can lead to a big performance gain.
For the partition predicate to work correctly, you need to know which
partition your data belongs to so that you can send the
request to the correct partition. One of the ways of doing it is to
make use of the PartitionAware
interface when data is
inserted, thereby controlling the owning partition. See the
PartitionAware section for more information and examples.
A concrete example may be a web shop that sells phones and accessories.
To find all the accessories of a phone,
a query could be executed that selects all accessories for that phone.
This query is executed on all members in the cluster and
therefore could generate quite a lot of load. However, if we would store
the accessories in the same partition as the phone, the
partition predicate could use the partitionKey
of the phone to select
the right partition and then it queries for
the accessories for that phone; and this reduces the load on the system
and get faster query results.
Configuring the Query Thread Pool
You can change the size of thread pool dedicated to query operations
using the pool-size
property. Each query consumes a single thread
from a Generic Operations ThreadPool on each Hazelcast member - let’s
call it the query-orchestrating thread. That thread is blocked throughout
the whole execution-span of a query on the member.
The query-orchestrating thread uses the threads from the query-thread pool in the following cases:
-
if you run a
PagingPredicate
(since each page runs as a separate task) -
if you set the system property
hazelcast.query.predicate.parallel.evaluation
to true (since the predicates are evaluated in parallel)
See the Filtering with Paging Predicates section and System Properties appendix for information about paging predicates and for description of the above system property.
Below is an example of that declarative configuration.
<hazelcast>
...
<executor-service name="hz:query">
<pool-size>100</pool-size>
</executor-service>
...
</hazelcast>
hazelcast:
...
executor-service:
"hz:query":
pool-size: 100
Below is the equivalent programmatic configuration.
Config cfg = new Config();
cfg.getExecutorConfig("hz:query").setPoolSize(100);
Query Requests from Clients
When dealing with the query requests coming from the clients to your members, Hazelcast offers the following system properties to tune your thread pools:
-
hazelcast.clientengine.thread.count
which is the number of threads to process non-partition-aware client requests, likemap.size()
and executor tasks. Its default value is the number of cores multiplied by 20. -
hazelcast.clientengine.query.thread.count
which is the number of threads to process query requests coming from the clients. Its default value is the number of cores.
If there are a lot of query request from the clients, you may want to
increase the value of hazelcast.clientengine.query.thread.count
. In
addition to this tuning, you may also consider increasing the value of
hazelcast.clientengine.thread.count
if the CPU load in your system is
not high and there is plenty of free memory.
Next Steps
Keep an up-to-date record of query results, using a continuous query cache.
Create more complex queries with custom attributes.