This is a prerelease version.

View latest

VectorCollection data structure design

A Hazelcast vector database engine is a specialized type of database, which is optimized for storing, searching, and managing vector embeddings and additional metadata. You can include values in the metadata to provide filtering, additional processing, or analysis of vectors.

For further information on the design of the Vector Collection data structure, see Vector Collection.

Architecture

The main components of vector search are illustrated in the following high-level diagram:

The high-level diagram of the main components

The components shown in the diagram are as follows:

  • Clients: methods and libraries for connecting and working with the vector database. Currently, two language clients are available: Java and Python.

  • Collections: a set consisting of one or several indexes and a one metadata storage. For further information about vector collection, see Vector Collection.

  • Indexes: named sets of vectors. Vectors within the same index must have the same dimensions and be compared using the same metric. For further information on indexes, see Index Overview.

  • Metadata storage: key-value storage for storing metadata values. The key and value can be any objects; for example, JSON, unstructured text, or java-serialized POJO. For more information about the available storage types, see Serializing Objects and Classes

An overview of vector collection is illustrated below:

Vector collection overview

A user-defined key, which is assigned during the addition of the vector to the collection, establishes a link between the vector and the metadata. Each key corresponds to a single vector from each index.

Index

Essentially, each index serves as a distinct vector space. In terms of storage, the index is a graph where each node represents a vector, and the edges are organized to optimize search efficiency.

The index is based on the JVector library, which implements a DiskANN algorithm for similarity search.

Partitioning and Replication

Each collection is partitioned and replicated based on the system’s general partitioning rules. Data partitioning is carried out using the collection key.

Vector collection is an AP data structure and implements standard in-memory backup types. Vector collection does not currently support reading from backup and file based backups.

For further information on Hazelcast partitioning, see Data Partitioning and Replication.

Data store

Hazelcast stores data in-memory (RAM) for faster access. Presently, the only available data storage option is the JVM heap store.

Fault Tolerance

Hazelcast distributes storage data across all cluster members. In the event of a graceful shutdown, the data is migrated to remaining active members. In the event of member crash, if backups were configured, they are used to restore the data. If no backups were configured, data can be lost.

You can disable backups to speed up data ingestion into Vector Collection and reduce memory usage in development and test environments which do not require fault tolerance. With a single-member cluster (for example, an embedded dev cluster), you do not need to disable backups because these are not used in this case.

Vector indexes are partitioned, so when you execute similarity search all partitions need to be searched and partial results aggregated. This process impacts search performance and recall:

  • considering more candidate results usually yields better recall but uses more resources,

  • staged execution affects latency because there is more communication and some stages need to wait for all partial results before proceeding.

The default search algorithm is a two-stage search which works as follows:

  1. The member that received the query becomes a coordinator.

  2. The coordinator distributes the search request to each member (including the coordinator member). Each member is tasked with returning results from partitions it owns.

  3. Each member executes the search on owned partitions in parallel and aggregates the partition results.

  4. Each member returns the partially aggregated results to the coordinator.

  5. The coordinator aggregates the partial results and generates the final result. If required, searches on some partitions can be retried individually. For example, this can be useful for migrations, when members leave the cluster, or to resolve errors.

At each stage, aggregation is based on score and only the best results are retained.

Two parameters in this search algorithm determine the amount of data sent between the members and the quality of the final result. These parameters are as follows:

  • partitionLimit - number of search results obtained from each partition

  • memberLimit - number of search results returned from member to coordinator

To allow the system to return enough results, the following conditions must be satisfied where topK denotes the number of entries requested:

  • partitionLimit * partitionCount >= topK, partitionLimit <= topK

  • memberLimit * memberCount >= topK, memberLimit <= topK

  • efSearch >= partitionLimit, if partitionLimit is not configured explicitly this applies to the default partitionLimit value

By default, memberLimit is equal to topK and partitionLimit is calculated based on topK and cluster configuration (number of partitions) in a way that is unlikely to cause quality degradation.

Consider tuning efSearch based on quality and throughput/latency requirements.

Heuristics for partitionLimit assume that data (vectors) is distributed uniformly in partitions. If this is not the case, for example if the closest neighbours reside only in a single or a few partitions, the default value of partitionLimit may negatively impact search quality. In such a case consider increasing the partitionLimit.

memberLimit is less critical for overall behavior if there are only a few members.

Diagram

A simplified search algorithm can be used, which does not perform intermediate aggregation of results at member level. It is used where the cluster has only a single member, or can be enabled using search hint.

A single-stage search request is executed in parallel on all partitions (on their owners) and partition results are aggregated directly on the coordinator member to produce the final result.

This search algorithm uses efSearch and partitionLimit parameters, which behave in the same way as for two-stage search.

Diagram

Partition count impact

The number of partitions has a big impact on the performance of the vector collection. The conflicting factors that can impact the selection of an optimal partition count are as follows:

  • data ingestion: a greater number of partitions results in improved parallelism, up to around the total number of partition threads in the cluster. After this point, more partitions will not significantly improve ingestion speed. If there are fewer partitions than number of cores, not all available resources will be utilized during ingestion because updates on a given partition are executed by single thread.

  • similarity search: in general, having fewer partitions results in better search performance and reduced latency. However, the impact on quality/recall is complicated and depends also on efSearch and partitionLimit values.

  • migration: avoid partitions with a large memory size, including metadata, vectors and vector index internal representation. In general, the recommendation is for a partition size of around 50-100MB per partition, which results in fast migrations and small pressure on heap during migration. However, for vector search, the partition size can be increased above that general recommendation provided that there is enough heap memory for migrations (see below).

  • other data structures: number of partitions is a cluster-wide setting shared by all data structures. If the needs are vastly different, you might consider creating separate clusters.

It is not possible to change the number of partitions for an existing cluster.
For this Beta version, the following recommendations apply:

The entire collection partition is migrated as a single chunk. If using partitions that are larger than the recommended size, ensure that you have sufficient heap memory to run migrations. The amount of heap memory required is approximately the size of the vector collection partition multiplied by the number of parallel migrations. To decrease pressure on heap memory, you can decrease the number of parallel migrations using hazelcast.partition.max.parallel.migrations and hazelcast.partition.max.parallel.replications.

Capacity planning

Memory usage of Vector Collection

A few major elements contribute to Vector Collection memory requirements:

  • entry keys and values: memory footprint = entry count * (avg key size + avg value size). The sizes should be calculated in serialized form.

  • vectors: memory footprint = entry count * dimensions * 4 (4 being the size of the float coordinate). The formula is for a collection with 1 vector index, with multiple vector indexes each would have to be added separately. Deduplication does not affect the memory usage significantly unless there are many duplicated vectors. In such cases, the footprint could be smaller.

  • vector index: memory footprint = entry count * max degree * 8 (this is a simplification, but should be a good-enough approximation).

  • additional overheads. They can be estimated as approximately 308 bytes per entry in collection with 1 vector index. The precise value depends on architecture, JVM settings, if the deduplication is enabled and other factors. The value given here is for Compressed OOPS and default 8-byte object alignment with deduplication enabled.

There are other factors and overheads, but they are smaller.

When entries in Vector Collection are updated or removed, old vectors and nodes in the vector index remain in memory until optimization is performed. Enough heap space should be available for them.

If backups are used (by default there is 1 backup), they also need to be included in the computation. Each backup (sync and async) needs the same amount of memory as primary data. This brings us to the final formula for memory footprint:

total memory footprint = (1 + backup count) * entry count * (avg key size + avg value size + dimensions * 4 + max degree * 8 + 308)

In many cases, in particular with default settings and dimensions > 1500 or with larger metadata (at least a 5kB), vector index size contributes less than 10% to the total memory usage. In these cases the formula can be simplified and expressed in terms of raw data size (vectors and metadata):

total memory footprint = (1 + backup count) * raw data size * 1.1

All these data have to be distributed across the cluster members. General recommendations from Hazelcast capacity planning apply also to Vector Collection. Keep enough headroom (40% of free heap memory is recommended) to accommodate member failure or shutdown and ensure that the cluster has enough resources to operate after member failures.

If all members have the same heap size, the vector collection will need a cluster at least of size:

number of members = ceil(total memory footprint / (heap size * 60%))

If other data will also be stored in the cluster, eg. IMaps, this also has to be accounted for.

Tuning tips

  1. Enable Vector API.

  2. Prefer the DOT metric with normalized vectors over the COSINE metric if your use case does not require the COSINE metric.

  3. Adjust efSearch to achieve the desired balance between throughput/latency and precision. By default efSearch = topK. For searches with small topK (for example, 1 - 10), it may be beneficial to use a larger value to get better precision. For large topK (for example 100), a smaller efSearch value will give better performance with only a potentially small and acceptable decrease in precision.

  4. Test if adjusting efSearch gives satisfactory results before increasing index build parameters (max-degree, ef-construction) which would result in slower index builds and searches, and a larger index.

  5. Vector deduplication does not incur significant overhead for uploads (usually less than 1%) and searches. You may consider disabling it to get slightly better performance and smaller memory usage if your dataset does not contain duplicated vectors. However, be aware that in the presence of many duplicated vectors with deduplication disabled, a similarity search may return poor quality results.

  6. For a given query, each vector index partition is searched by one thread. The number of concurrent partition searches is determined by hz:vector-query executor configuration, which by default has a pool size equal to half of the virtual core count of your host machines (effectively, for CPUs with HyperThreading, the pool size is equal to the number of physical cores). This results in a good balance between search throughput and CPU utilization. When HyperThreading is not enabled or available (in particular on ARM CPUs), increasing the default hz:vector-query pool size can result in a significant boost in performance. Setting hz:vector-query to have a pool size greater than physical core count will not deliver a significant increase in throughput but will increase total CPU utilization. The hz:vector-query pool size can be changed as follows:

    • Java

    • XML

    • YAML

    Config cfg = new Config();
    cfg.getExecutorConfig("hz:vector-query").setPoolSize(100);
    <hazelcast>
        ...
        <executor-service name="hz:vector-query">
            <pool-size>100</pool-size>
        </executor-service>
        ...
    </hazelcast>
    hazelcast:
      ...
      executor-service:
        "hz:vector-query":
          pool-size: 100
  7. Decreasing the number of partitions can improve query performance but has significant impact on the entire cluster.

  8. If there are fewer partitions than available cores, not all cores will be used for single search execution. This is ok if you are focused on throughput, as in general fewer partitions means you need less resources. However, if you want to achieve the best latency for a single client, it is better to distribute the search to as many cores as possible, which requires having at least as many partitions as cores in the cluster.

  9. The vectorCollection.searchIndexVisitedNodes metric can be helpful to understand vector search performance. If the fraction of number of nodes visited per search to collection size is high, this may indicate that vector index is not beneficial in the given case.