How Distributed Query Works
The requested predicate is sent to each member in the cluster. Each member looks at its own local entries and filters them according to the predicate. At this stage, key/value pairs of the entries are deserialized and then passed to the predicate. The predicate requester merges all the results coming from each member into a single set.
Distributed query is highly scalable. If you add new members to the cluster, the partition count for each member is reduced and thus the time spent by each member on iterating its entries is reduced. In addition, the pool of partition threads evaluates the entries concurrently in each member and the network traffic is also reduced since only filtered data is sent to the requester.
Hazelcast offers the following APIs for distributed query purposes:
-
Criteria API
-
Querying with SQL-like Predicates
Employee Map Query Example
Assume that you have an "employee" map containing values of
Employee
objects, as coded below.
public class Employee implements Serializable {
private String name;
private int age;
private boolean active;
private double salary;
public Employee(String name, int age, boolean active, double salary) {
this.name = name;
this.age = age;
this.active = active;
this.salary = salary;
}
public Employee() {
}
public String getName() {
return name;
}
public int getAge() {
return age;
}
public double getSalary() {
return salary;
}
public boolean isActive() {
return active;
}
}
Now let’s look for the employees who are active and have an age less than 30 using the aforementioned APIs (Criteria API and Distributed SQL Query). The following subsections describe each query mechanism for this example.
When using Portable objects, if one field of an object exists on one member but does not exist on another one, Hazelcast does not throw an unknown field exception. Instead, Hazelcast treats that predicate, which tries to perform a query on an unknown field, as an always false predicate. |
Querying with Criteria API
Criteria API is a programming interface offered by Hazelcast that is similar to the Java Persistence Query Language (JPQL). Below is the code for the above example query.
IMap<String, Employee> map = hazelcastInstance.getMap( "employee" );
EntryObject e = Predicates.newPredicateBuilder().getEntryObject();
Predicate predicate = e.is( "active" ).and( e.get( "age" ).lessThan( 30 ) );
Collection<Employee> employees = map.values( predicate );
In the above example code, predicate
verifies whether the entry is
active and its age
value is less than 30. This predicate
is
applied to the employee
map using the map.values(predicate)
method.
This method sends the predicate to all cluster members
and merges the results coming from them. Since the predicate is
communicated between the members, it needs to
be serializable.
Predicates can also be applied to keySet , entrySet and
localKeySet of the Hazelcast distributed map.
|
Predicates Class Operators
The Predicates
class includes many operators for your query requirements.
The following are descriptions for some of them:
-
equal
: Checks if the result of an expression is equal to a given value. -
notEqual
: Checks if the result of an expression is not equal to a given value. -
instanceOf
: Checks if the result of an expression has a certain type. -
like
: Checks if the result of an expression matches some string pattern. % (percentage sign) is the placeholder for many characters, (underscore) is placeholder for only one character. -
ilike
: A case-insensitive variant oflike
. -
greaterThan
: Checks if the result of an expression is greater than a certain value. -
greaterEqual
: Checks if the result of an expression is greater than or equal to a certain value. -
lessThan
: Checks if the result of an expression is less than a certain value. -
lessEqual
: Checks if the result of an expression is less than or equal to a certain value. -
between
: Checks if the result of an expression is between two values (this is inclusive). -
in
: Checks if the result of an expression is an element of a certain collection. -
isNot
: Checks if the result of an expression is false. -
regex
: Checks if the result of an expression matches some regular expression. -
alwaysTrue
: The result of an expression always matches. -
alwaysFalse
: The result of an expression ever matches.
See the Predicates Javadoc for all predicates provided. |
Combining Predicates with AND, OR, NOT
You can combine predicates using the and
, or
and not
operators,
as shown in the below examples.
public Collection<Employee> getWithNameAndAge( String name, int age ) {
Predicate namePredicate = Predicates.equal( "name", name );
Predicate agePredicate = Predicates.equal( "age", age );
Predicate predicate = Predicates.and( namePredicate, agePredicate );
return employeeMap.values( predicate );
}
public Collection<Employee> getWithNameOrAge( String name, int age ) {
Predicate namePredicate = Predicates.equal( "name", name );
Predicate agePredicate = Predicates.equal( "age", age );
Predicate predicate = Predicates.or( namePredicate, agePredicate );
return employeeMap.values( predicate );
}
public Collection<Employee> getNotWithName( String name ) {
Predicate namePredicate = Predicates.equal( "name", name );
Predicate predicate = Predicates.not( namePredicate );
return employeeMap.values( predicate );
}
Simplifying with PredicateBuilder
You can simplify predicate usage with the PredicateBuilder
interface,
which offers simpler predicate building. See the
below example code which selects all people with a certain name and age.
public Collection<Employee> getWithNameAndAgeSimplified( String name, int age ) {
EntryObject e = Predicates.newPredicateBuilder().getEntryObject();
Predicate agePredicate = e.get( "age" ).equal( age );
Predicate predicate = e.get( "name" ).equal( name ).and( agePredicate );
return employeeMap.values( predicate );
}
Querying with SQL-like Predicates
Predicates.sql()
takes the regular SQL where
clause.
Here is an example:
IMap<String, Employee> map = hazelcastInstance.getMap( "employee" );
Set<Employee> employees = map.values( Predicates.sql( "active AND age < 30" ) );
Hazelcast offers an SQL service that allows to execute SQL queries,
as opposed to SQL-like predicates in case of Predicates.sql() . See the
SQL chapter for more information.
|
Supported SQL Syntax
AND/OR: `<expression> AND <expression> AND <expression>… `
-
active AND age>30
-
active=false OR age = 45 OR name = 'Joe'
-
active AND ( age > 20 OR salary < 60000 )
Equality: =, !=, <, ⇐, >, >=
-
<expression> = value
-
age ⇐ 30
-
name = 'Joe'
-
salary != 50000
BETWEEN: <attribute> [NOT] BETWEEN <value1> AND <value2>
-
age BETWEEN 20 AND 33 ( same as age >= 20 AND age ⇐ 33 )
-
age NOT BETWEEN 30 AND 40 ( same as age < 30 OR age > 40 )
IN: <attribute> [NOT] IN (val1, val2,…)
-
age IN ( 20, 30, 40 )
-
age NOT IN ( 60, 70 )
-
active AND ( salary >= 50000 OR ( age NOT BETWEEN 20 AND 30 ) )
-
age IN ( 20, 30, 40 ) AND salary BETWEEN ( 50000, 80000 )
LIKE: <attribute> [NOT] LIKE "expression"
The %
(percentage sign) is placeholder for multiple characters,
an _
(underscore) is placeholder for only one character.
-
name LIKE 'Jo%'
(true for 'Joe', 'Josh', 'Joseph' etc.) -
name LIKE 'Jo_'
(true for 'Joe'; false for 'Josh') -
name NOT LIKE 'Jo_'
(true for 'Josh'; false for 'Joe') -
name LIKE 'J_s%'
(true for 'Josh', 'Joseph'; false 'John', 'Joe')
ILIKE: <attribute> [NOT] ILIKE 'expression'
Similar to LIKE predicate but in a case-insensitive manner.
-
name ILIKE 'Jo%'
(true for 'Joe', 'joe', 'jOe','Josh','joSH', etc.) -
name ILIKE 'Jo_'
(true for 'Joe' or 'jOE'; false for 'Josh')
REGEX: <attribute> [NOT] REGEX 'expression'
-
name REGEX 'abc-.*'
(true for 'abc-123'; false for 'abx-123')
You can escape the % and _ placeholder characters in your
SQL queries with predicates using the
backslash (\ ) character. The apostrophe (' ) can be escaped with another
apostrophe, i.e., '' . If you use REGEX, you need to escape characters
according to the normal Java escape syntax; see here
for the details.
|
Querying Entry Keys with Predicates
You can use __key
attribute to perform a predicated search for entry
keys. See the following example:
IMap<String, Person> personMap = hazelcastInstance.getMap(persons);
personMap.put("Alice", new Person("Alice", 35, Gender.FEMALE));
personMap.put("Andy", new Person("Andy", 37, Gender.MALE));
personMap.put("Bob", new Person("Bob", 22, Gender.MALE));
[...]
Predicate predicate = Predicates.sql("__key like A%");
Collection<Person> startingWithA = personMap.values(predicate);
In this example, the code creates a collection with the entries whose keys start with the letter "A”.
Querying JSON Strings
You can query JSON strings stored inside your Hazelcast clusters. To
query a JSON string,
you first need to create a HazelcastJsonValue
from the JSON string.
You can use HazelcastJsonValue
s both as keys and values in the
distributed data structures. Then, it is
possible to query these objects using the Hazelcast query methods
explained in this section.
String person1 = "{ \"name\": \"John\", \"age\": 35 }";
String person2 = "{ \"name\": \"Jane\", \"age\": 24 }";
String person3 = "{ \"name\": \"Trey\", \"age\": 17 }";
IMap<Integer, HazelcastJsonValue> idPersonMap = instance.getMap("jsonValues");
idPersonMap.put(1, new HazelcastJsonValue(person1));
idPersonMap.put(2, new HazelcastJsonValue(person2));
idPersonMap.put(3, new HazelcastJsonValue(person3));
Collection<HazelcastJsonValue> peopleUnder21 = idPersonMap.values(Predicates.lessThan("age", 21));
When running the queries, Hazelcast treats values extracted from
the JSON documents as Java types so they
can be compared with the query attribute. JSON specification
defines five primitive types to be used in the JSON
documents: number
,string
, true
, false
and null
. The string
,
true/false
and null
types are treated
as String
, boolean
and null
, respectively. We treat the extracted
number
values as long
s if they
can be represented by a long
. Otherwise, number
s are treated
as double
s.
It is possible to query nested attributes and arrays in JSON documents. The query syntax is the same as querying other Hazelcast objects as explained in the Querying in Collections and Arrays section.
/**
* Sample JSON object
*
* {
* "departmentId": 1,
* "room": "alpha",
* "people": [
* {
* "name": "Peter",
* "age": 26,
* "salary": 50000
* },
* {
* "name": "Jonah",
* "age": 50,
* "salary": 140000
* }
* ]
* }
*
*
* The following query finds all the departments that have a person named "Peter" working in them.
*/
Collection<HazelcastJsonValue> departmentWithPeter = departments.values(Predicates.equal("people[any].name", "Peter"));
HazelcastJsonValue
is a lightweight wrapper around your JSON strings.
It is used merely as a way to indicate
that the contained string should be treated as a valid JSON value.
Hazelcast does not check the validity of JSON
strings put into to maps. Putting an invalid JSON string in a map is
permissible. However, in that case
whether such an entry is going to be returned or not from a query is not defined.
Metadata Creation for JSON Querying
Hazelcast stores a metadata object per HazelcastJsonValue
stored.
This metadata object is created every time
a HazelcastJsonValue
is put into an IMap and stored in the on-heap or off-heap
memories depending on your IMap’s in-memory format setting.
Metadata is later used to speed up the query operations. Metadata creation
is on by default. Depending on your application’s needs, you may want
to turn off the metadata creation
to decrease the put latency and increase the throughput. You can configure
this using Metadata Policy.
Filtering with Paging Predicates
Hazelcast provides paging for defined predicates. With its PagingPredicate
interface, you can
get a collection of keys, values, or entries page by page by filtering
them with predicates and giving the size of the pages. Also, you
can sort the entries by specifying comparators. In this case, the comparator
should be Serializable
and the serialization factory implementations you use,
e.g., PortableFactory
and DataSerializableFactory
, should be registered.
See the Serialization chapter on how to register these
factories.
Paging predicates require the objects to be deserialized both on the calling side (either a member or client) and the member side from which the collection is retrieved. Therefore, you need to register the serialization factories you use on all the members and clients on which the paging predicates are used. See the Serialization chapter on how to register these factories.
In the example code below:
-
The
greaterEqual
predicate gets values from the "students" map. This predicate has a filter to retrieve the objects with an "age" greater than or equal to 18. -
Then a
PagingPredicate
is constructed in which the page size is 5, so that there are five objects in each page. The first time thevalues()
method is called, the first page is fetched. -
Finally, the subsequent page is fetched by calling the
nextPage()
method ofPagingPredicate
and querying the map again with the updatedPagingPredicate
.
IMap<Integer, Student> map = hazelcastInstance.getMap( "students" );
Predicate greaterEqual = Predicates.greaterEqual( "age", 18 );
PagingPredicate pagingPredicate = Predicates.pagingPredicate( greaterEqual, 5 );
// Retrieve the first page
Collection<Student> values = map.values( pagingPredicate );
...
// Set up next page
pagingPredicate.nextPage();
// Retrieve next page
values = map.values( pagingPredicate );
...
If a comparator is not specified for PagingPredicate
, but you want
to get a collection of keys or values page by page, keys or values must
be instances of Comparable
(i.e., they must implement java.lang.Comparable
).
Otherwise, the java.lang.IllegalArgument
exception is thrown.
You can also access a specific page more
easily with the help of the setPage()
method. This way, if you make
a query for the hundredth page, for example, it gets all 100 pages at
once instead of reaching the hundredth page one by one using the nextPage()
method.
Note that this feature tires the memory and see the
PagingPredicate Javadoc.
Paging Predicate, also known as Order & Limit, is not supported in Transactional Context. |
Filtering with Partition Predicate
You can run queries on a single partition in your cluster using
the partition predicate (PartitionPredicate
).
The Predicates.partitionPredicate()
method takes a predicate and partition key
as parameters, gets the partition ID using the key and runs that predicate only
on the partition where that key belongs.
See the following code snippet:
...
Predicate predicate = Predicates.partitionPredicate(partitionKey, Predicates.alwaysTrue());
Collection<Integer> values = map.values(predicate);
Collection<String> keys = map.keySet(predicate);
...
By default there are 271 partitions, and using a regular predicate, each partition needs to be accessed. However, if the partition predicate only accesses a single partition, this can lead to a big performance gain.
For the partition predicate to work correctly, you need to know which
partition your data belongs to so that you can send the
request to the correct partition. One of the ways of doing it is to
make use of the PartitionAware
interface when data is
inserted, thereby controlling the owning partition. See the
PartitionAware section for more information and examples.
A concrete example may be a web shop that sells phones and accessories.
To find all the accessories of a phone,
a query could be executed that selects all accessories for that phone.
This query is executed on all members in the cluster and
therefore could generate quite a lot of load. However, if we would store
the accessories in the same partition as the phone, the
partition predicate could use the partitionKey
of the phone to select
the right partition and then it queries for
the accessories for that phone; and this reduces the load on the system
and get faster query results.
Indexing Queries
Hazelcast distributed queries run on each member in parallel and return only the results to the caller. Then, on the caller side, the results are merged.
When a query runs on a
member, Hazelcast iterates through all the owned entries and finds the
matching ones. This can be made faster by indexing
the most-queried fields, just like you would do for your database.
Indexing adds overhead for each write
operation but reading will be a lot faster. If you query your map a
lot, make sure to add indexes for the most frequently
queried fields. For example, if you do active AND age < 30
query,
make sure you add an index for the active
and
age
fields. The following example code does that by getting the map
from the Hazelcast instance and adding indexes to the map with the
IMap addIndex
method.
IMap map = hazelcastInstance.getMap( "employees" );
// ordered, since we have ranged queries for this field
map.addIndex(new IndexConfig(IndexType.SORTED, "age"));
// not ordered, because boolean field cannot have range
map.addIndex(new IndexConfig(IndexType.HASH, "active"));
Note that creating indexes once is sufficient. Subsequent write
operations on the map are reflected in the index automatically. So,
although it is safe to call the addIndex()
method repeatedly, there
will be a performance penalty due to the redundant index creation.
When you call, for example, map.addIndex("fieldName", true)
, each
partition iterates over its records and adds each entry to the index.
The previously created index entry will be recreated and replaced with the new entry.
The performance penalty will be proportional to the number of entries. If you
have maps with a large number of entries, then synchronizing index addition process
is recommended.
Other than using the addIndex()
method, you can define your index
declaratively or programmatically as described in the Configuring IMap Indexes section.
Indexing Ranged Queries
IMap.addIndex(IndexConfig)
is used for adding index. For
each indexed field, if you have ranged queries such as age>30
,
age BETWEEN 40 AND 60
, then use IndexType.SORTED
index
Otherwise, use IndexType.HASH
.
Configuring IMap Indexes
Also, you can define IMap
indexes in configuration. An example is
shown below.
<hazelcast>
...
<map name="default">
<indexes>
<index type="HASH">
<attributes>
<attribute>name</attribute>
</attributes>
</index>
<index>
<attributes>
<attribute>age</attribute>
</attributes>
</index>
</indexes>
</map>
...
</hazelcast>
hazelcast:
map:
default:
indexes:
- type: HASH
attributes:
- "name"
- attributes:
- "age"
<hz:map name="default">
<hz:indexes>
<hz:index type="HASH">
<hz:attributes>
<hz:attribute>name</hz:attribute>
</hz:attributes>
</hz:index>
<hz:index>
<hz:attributes>
<hz:attribute>age</hz:attribute>
</hz:attributes>
</hz:index>
</hz:indexes>
</hz:map>
You can also define IMap
indexes using programmatic configuration,
as in the example below.
mapConfig.addIndexConfig(new IndexConfig(IndexType.HASH, "name"));
mapConfig.addIndexConfig(new IndexConfig(IndexType.SORTED, "age"));
The following is the Spring declarative configuration for the same example.
Non-primitive types to be indexed should implement Comparable .
|
If you configure the data structure to use High-Density Memory Store and indexes, the indexes are automatically stored in the High-Density Memory Store as well. This prevents from running into full garbage collections when doing a lot of updates to index. |
Global and Partitioned Indexes
The on-heap indexes are always global, i.e., one index covers all IMap
s entries stored on the partitions
owned by a cluster member. Such indexes are beneficial for lookup and range queries because only one lookup
operation is needed to execute a query. A drawback of global indexes is a potentially high contention on the
index concurrent data structure that might cause performance degradation.
High-Density Memory Store supports partitioned indexes. Each partition owned by a cluster member has its own index. All operations on the partitioned index are performed on the partitioned thread, thus eliminating the contention issue of the global indexes. However, lookup and range queries have to perform lookup operations on every partition and combine the results. Normally, these partition and combine executions yield poorer performance results compared to the global indexes.
Global concurrent indexes (based on our own off-heap B+ Tree implementation)
bring all the benefits of global indexes to IMap
backed by High-Density Memory Store.
The global High-Density Memory Store indexes are enabled by default and controlled
by the hazelcast.hd.global.index.enabled
property. You can disable these indexes by setting
this property to false.
Composite Indexes
Composite indexes, also known as compound indexes, are special kind of indexes that are built on top of the multiple map entry attributes and therefore may be used to significantly speed up the queries involving those attributes simultaneously.
There are two distinct composite index types used for two different purposes: unordered composite indexes and ordered ones.
Unordered Composite Indexes
The unordered indexes are used to perform equality queries, also known
as the point queries, e.g., name = 'Alice'
. These are specifically
optimized for equality queries and don’t support other comparison operators
like >
or <=
.
Additionally, the composite unordered indexes allow speeding up the equality
queries involving multiple attributes simultaneously, e.g., name = 'Alice'
and age = 33
. This example query results in a single composite index lookup
operation which can be performed very efficiently.
The unordered composite index on the name
and age
attributes may be
configured for a map as follows:
<hazelcast>
...
<map name="persons">
<indexes>
<index type="HASH">
<attributes>
<attribute>name</attribute>
<attribute>age</attribute>
</attributes>
</index>
</indexes>
</map>
...
</hazelcast>
hazelcast:
map:
default:
- type: HASH
attributes:
- "name"
- "age"
The attributes indexed by the unordered composite indexes can’t be
matched partially: the name = 'Alice'
query can’t utilize the composite
index configured above.
Ordered Composite Indexes
The ordered indexes are specifically designed to perform efficient order
comparison queries, also known as the range queries, e.g., age > 33
. The
equality queries, like age = 33
, are still supported by the ordered indexes,
but they are handled in a slightly less efficient manner comparing to the
unordered indexes.
The composite ordered indexes extend the concept by allowing multiple
equality predicates and a single order comparison predicate to be combined
into a single index query operation. For instance, the name = 'Alice' and
age > 33
and name = 'Bob' and age = 33 and balance > 0.0
queries are good
candidates to be covered by an ordered composite index configured as follows:
<hazelcast>
...
<map name="persons">
<indexes>
<index>
<attributes>
<attribute>name</attribute>
<attribute>age</attribute>
<attribute>balance</attribute>
</attributes>
</index>
</indexes>
</map>
...
</hazelcast>
hazelcast:
map:
persons:
indexes:
- attributes:
- "name"
- "age"
- "balance"
Unlike the unordered composite indexes, partial attribute prefixes may be
matched for the ordered composite indexes. In general, a valid non-empty
attribute prefix is formed as a sequence of zero or more equality predicates
followed by a zero or exactly one order comparison predicate. Given the index
definition above, the following queries may be served by the index: name = 'Alice'
,
name > 'Alice'
, name = 'Alice' and age > 33
, name = 'Alice' and age = 33 and
balance = 5.0
. The following queries can’t be served the index: age = 33
,
age > 33 and balance = 0.0
, balance > 0.0
.
While matching the ordered composite indexes, multiple order comparison
predicates acting on the same attribute are treated as a single range
predicate acting on that attribute. Given the index definition above, the
following queries may be served by the index: name > 'Alice' and name < 'Bob'
,
name = 'Alice' and age > 33 and age < 55
, name = 'Alice' and age = 33 and
balance > 0.0 and balance < 100.0
.
Composite Index Matching and Selection
The order of attributes involved in a query plays no role in the selection
of the matching composite index: name = 'Alice' and age = 33
and
age = 33 and name = 'Alice'
queries are equivalent from the point of
view of the index matching procedure.
The attributes involved in a query can be matched partially by the composite
index matcher: name = 'Alice' and age = 33 and balance > 0.0
can be
partially matched by the name, age
composite index, the name = 'Alice'
and age = 33
predicates are served by the matched index, while the
balance > 0.0
predicate is processed by other means.
Bitmap Indexes
Bitmap indexes provide capabilities similar to unordered/hash indexes. The same set of predicates is supported:
-
equal
-
notEqual
-
in
, -
and
-
or
-
not
But, unlike hash indexes, bitmap indexes are able to achieve a much higher memory efficiency for low cardinality attributes at the cost of reduced query performance. In practice, the query performance is comparable to the performance of hash indexes, while memory footprint reduction is high, usually around an order of magnitude.
Bitmap indexes are specifically designed for indexing of collection and
array attributes since a single IMap
entry produces many index entries
in that case. A single hash index entry costs a few tens of bytes, while
a single bitmap index entry usually costs just a few bytes.
It’s also possible to improve the memory footprint while indexing regular single-value attributes, but the improvement is usually minor, depending on the data layout and total number of indexes.
Currently, bitmap indexes are not supported by off-heap High-Density Memory Stores (HD). |
Configuring Bitmap Indexes
In the simplest form, bitmap index for an IMap
entry attribute can be
declaratively configured as follows:
<hazelcast>
...
<map name="persons">
<indexes>
<index type="BITMAP">
<attributes>
<attribute>age</attribute>
</attributes>
</index>
</indexes>
</map>
...
</hazelcast>
hazelcast:
map:
persons:
indexes:
- type: BITMAP
attributes:
- "age"
Internally, a unique non-negative long
ID is assigned to every
indexed IMap
entry based on the entry key. That unique ID is
required for bitmap indexes to distinguish one indexed IMap
entry from
another.
The mapping between IMap
entries and long
IDs is not free and its
performance and memory footprint can be improved in certain cases. For
instance, if IMap
entries already have a unique integer-valued
attribute, the attribute values can be used as unique long
IDs
directly without any additional transformations. That can be configured
as follows:
<index type="BITMAP">
<attributes>
<attribute>age</attribute>
</attributes>
<bitmap-index-options>
<unique-key>uniqueId</unique-key>
<unique-key-transformation>RAW</unique-key-transformation>
</bitmap-index-options>
</index>
indexes:
- type: BITMAP
attributes:
- "age"
bitmap-index-options:
unique-key: uniqueId
unique-key-transformation: RAW
The index definition above instructs Hazelcast to create a bitmap index
on the age
attribute, extract the unique key values from uniqueId
attribute
and use the raw (RAW
) extracted values directly as long
IDs. If the
extracted unique key value is not of long
type, the widening
conversion is performed for the following types: byte
, short
and
int
; boxed variants are also supported.
In certain cases, the extracted raw IDs might be randomly distributed. This causes increased memory usage in bitmap indexes since the best case scenario for them is sequential contiguous IDs. That can be countered by applying the renumbering technique:
<index type="BITMAP">
<attributes>
<attribute>age</attribute>
</attributes>
<bitmap-index-options>
<unique-key>uniqueId</unique-key>
<unique-key-transformation>LONG</unique-key-transformation>
</bitmap-index-options>
</index>
indexes:
- type: BITMAP
attributes:
- "age"
bitmap-index-options:
unique-key: uniqueId
unique-key-transformation: LONG
The index definition above instructs the bitmap index to extract the unique
keys from uniqueId
attribute, convert every extracted non-negative
value to long
(LONG
) and assign an internal sequential unique long
ID based on that extracted and then converted unique value. The widening
conversion is applied to the extracted values, if necessary.
This long-to-long mapping is performed more efficiently than the general object-to-long mapping done for the simple index definitions. Basically, the following simple bitmap index definition:
<index type="BITMAP">
<attributes>
<attribute>age</attribute>
</attributes>
</index>
indexes:
- type: BITMAP
attributes:
- "age"
is equivalent to the following full-form definition:
<index type="BITMAP">
<attributes>
<attribute>age</attribute>
</attributes>
<bitmap-index-options>
<unique-key>__key</unique-key>
<unique-key-transformation>OBJECT</unique-key-transformation>
</bitmap-index-options>
</index>
indexes:
- type: BITMAP
attributes:
- "age"
bitmap-index-options:
unique-key: __key
unique-key-transformation: OBJECT
Which indexes age
attribute, uses IMap
entry keys (__key
) interpreted
as Java objects (OBJECT
) to assign internal unique long
IDs.
The full-form definition syntax is defined as follows:
<index type="BITMAP">
<attributes>
<attribute><attr></attribute>
</attributes>
<bitmap-index-options>
<unique-key><key></unique-key>
<unique-key-transformation><transformation></unique-key-transformation>
</bitmap-index-options>
</index>
indexes:
- type: BITMAP
attributes:
- <attribute>
bitmap-index-options:
unique-key: <key>
unique-key-transformation: <transformation>
The following are the parameter descriptions:
-
<attr>
: Specifies the attribute index. -
<key>
: Specifies the attribute to use as a unique key source for internal uniquelong
ID assignment. -
<transformation>
: Specifies the transformation to be applied to unique keys to generate uniquelong
IDs from them. The following transformations are supported:-
OBJECT
: Object-to-long transformation. Each extracted unique key value is interpreted as a Java object instance. Internally, an object-to-long hash table is used to establish the mapping from unique keys to unique IDs. Good as a general-purpose transformation. -
LONG
: Long-to-long transformation. Each extracted unique key value is interpreted as a non-negativelong
value, the widening conversion frombyte
,short
andint
is performed, if necessary. Internally, a long-to-long hash table is used to establish the mapping from unique keys to unique IDs, which is more efficient than the object-to-long hash table. It is good for sparse/random unique integer-valued keys renumbering to raise the IDs density and to make the bitmap index more memory-efficient as a result. -
RAW
: Raw transformation. Each extracted unique key value is interpreted as a non-negativelong
value, the widening conversion frombyte
,short
andint
is performed, if necessary. Internally, no hash table of any kind is used to establish the mapping from unique keys to unique IDs, the raw extracted keys are used directly as IDs. It is good for dense unique integer-valued keys, and it has the best performance in terms of time and memory.
-
The regular dotted attribute path syntax is supported for <attr>
and
<key>
:
<index type="BITMAP">
<attributes>
<attribute>name.first</attribute>
</attributes>
</index>
<index type="BITMAP">
<attributes>
<attribute>name.first</attribute>
</attributes>
<bitmap-index-options>
<unique-key>__key.id</unique-key>
</bitmap-index-options>
</index>
<index type="BITMAP">
<attributes>
<attribute>name.first</attribute>
</attributes>
<bitmap-index-options>
<unique-key>id.external</unique-key>
</bitmap-index-options>
</index>
indexes:
- type: BITMAP
attributes:
- name.first
- type: BITMAP
attributes:
- name.first
bitmap-index-options:
unique-key: __key.id
- type: BITMAP
attributes:
- name.first
bitmap-index-options:
unique-key: id.external
Collection and array indexing is also possible using the regular syntax:
<index type="BITMAP">
<attributes>
<attribute>habits[any]</attribute>
</attributes>
</index>
<index type="BITMAP">
<attributes>
<attribute>habits[0]</attribute>
</attributes>
</index>
indexes:
- type: BITMAP
attributes:
- habits[any]
- type: BITMAP
attributes:
- habits[0]
See Indexing in Collections and Arrays section for more details.
Copying Indexes
The underlying data structures used by the indexes need to copy the query results to make sure that the results are correct. This copying process is performed either when reading the index from the data structure (on-read) or writing to it (on-write).
On-read copying means that, for each index-read operation, the result of the query is copied before it is sent to the caller. Depending on the query result’s size, this type of index copying may be slower since the result is stored in a map, i.e., all entries need to have the hash calculated before being stored. Unlike the index-read operations, each index-write operation is fast, since there is no copying. So, this option can be preferred in index-write intensive cases.
On-write copying means that each index-write operation completely copies the underlying map to provide the copy-on-write semantics and this may be a slow operation depending on the index size. Unlike index-write operations, each index-read operation is fast since the operation only includes accessing the map that stores the results and returning them to the caller.
Another option is never copying the results of a query to a separate map. This means the results backed by the underlying index-map can change after the query has been executed (such as an entry might have been added or removed from an index, or it might have been remapped). This option can be preferred if you expect "mostly correct" results, i.e., if it is not a problem when some entries returned in the query result set do not match the initial query criteria. This is the fastest option since there is no copying.
You can set one of these options using the system property
hazelcast.index.copy.behavior
. The following values, which are explained
in the above paragraphs, can be set:
-
COPY_ON_READ
(the default value) -
COPY_ON_WRITE
-
NEVER
The following is an example configuration snippet:
<hazelcast>
<cluster-name>dev</cluster-name>
...
<properties>
<property name="hazelcast.index.copy.behavior">NEVER</property>
</properties>
...
</hazelcast>
hazelcast:
cluster-name: dev
...
properties:
hazelcast.index.copy.behavior: NEVER
...
See also the Configuring with System Properties section for reference.
Usage of this system property is supported for BINARY and OBJECT in-memory formats. Only in Hazelcast 3.8.7, it is also supported for NATIVE in-memory format. |
Indexing Attributes with ValueExtractor
You can also define custom attributes that may be referenced in predicates,
queries and indexes. Custom attributes can be defined by implementing a
ValueExtractor
. See the Custom Attributes section
for details.
Using "this" as an Attribute
You can use the keyword this
as an attribute name while adding an
index or creating a predicate. A basic usage is shown below.
map.addIndex(new IndexConfig(IndexType.SORTED, "this"));
Predicate<Integer, Integer> lessEqual = Predicates.between("this", 12, 20);
Another basic example using SQL
predicate is shown below.
Predicates.sql("this = 'jones'")
Predicates.sql("this.age > 33")
The special attribute this
acts on the value of a map entry. Typically,
you do not need to specify it while accessing a property of an entry’s
value, since its presence is implicitly assumed if the special attribute
__key is not specified.
Configuring Query Thread Pool
You can change the size of thread pool dedicated to query operations
using the pool-size
property. Each query consumes a single thread
from a Generic Operations ThreadPool on each Hazelcast member - let’s
call it the query-orchestrating thread. That thread is blocked throughout
the whole execution-span of a query on the member.
The query-orchestrating thread uses the threads from the query-thread pool in the following cases:
-
if you run a
PagingPredicate
(since each page runs as a separate task) -
if you set the system property
hazelcast.query.predicate.parallel.evaluation
to true (since the predicates are evaluated in parallel)
See the Filtering with Paging Predicates section and System Properties appendix for information on paging predicates and for description of the above system property.
Below is an example of that declarative configuration.
<hazelcast>
...
<executor-service name="hz:query">
<pool-size>100</pool-size>
</executor-service>
...
</hazelcast>
hazelcast:
...
executor-service:
"hz:query":
pool-size: 100
Below is the equivalent programmatic configuration.
Config cfg = new Config();
cfg.getExecutorConfig("hz:query").setPoolSize(100);
Query Requests from Clients
When dealing with the query requests coming from the clients to your members, Hazelcast offers the following system properties to tune your thread pools:
-
hazelcast.clientengine.thread.count
which is the number of threads to process non-partition-aware client requests, likemap.size()
and executor tasks. Its default value is the number of cores multiplied by 20. -
hazelcast.clientengine.query.thread.count
which is the number of threads to process query requests coming from the clients. Its default value is the number of cores.
If there are a lot of query request from the clients, you may want to
increase the value of hazelcast.clientengine.query.thread.count
. In
addition to this tuning, you may also consider increasing the value of
hazelcast.clientengine.thread.count
if the CPU load in your system is
not high and there is plenty of free memory.