Glossary

2-phase commit

An atomic commitment protocol for distributed systems. It consists of two phases: commit-request and commit. In commit-request phase, transaction manager coordinates all the transaction resources to commit or abort. In commit-phase, transaction manager decides to finalize operation by committing or aborting according to the votes of each transaction resource.

ACID

A set of properties (Atomicity, Consistency, Isolation, Durability) guaranteeing that transactions are processed reliably. Atomicity requires that each transaction be all or nothing, i.e., if one part of the transaction fails, the entire transaction fails. Consistency ensures that only valid data following all rules and constraints is written. Isolation ensures that transactions are securely and independently processed at the same time without interference (and without transaction ordering). Durability means that once a transaction has been committed, it will remain so, no matter if there is a power loss, crash, or error.

BFF

Backend-for-frontend. A web development architectural pattern that creates a dedicated backend for each frontend application.

Cache

A high-speed access area that can be either a reserved section of main memory or a storage device.

Cache access patterns

Strategies for optimizing memory access to improve performance and reduce latency. The most common access patterns are read-through, write-through, and write-behind.

Caching

Often referred to as memory caching, this is a technique that allows computer applications to temporarily store data in a computer’s memory for quick retrieval.

Cache miss

An event in which a system or application makes a request to retrieve data from a cache, but that specific data is not currently in cache memory.

CDC

Change data capture is a data pipeline pattern for observing changes made to a database and extracting them in a form usable by other systems, for the purposes of replication, analysis and more.

CEP

Complex event processing is a set of techniques for capturing and analyzing streams of data as they arrive to identify opportunities or threats in real time.

CLC

Hazelcast Command Line Client, used to connect to and interact with clusters on Hazelcast Cloud and Hazelcast Platform direct from the command line or through scripts.

CLI

Command-line interface.

Client/server topology

Hazelcast topology where members run outside the user application and are connected to clients using client libraries. The client library is installed in the user application.

Cloud-native architecture

An architecture in which an application is designed specifically for cloud deployment (in contrast to “lift-and-shift”, where applications are moved to the cloud unchanged).

DaaS

Data as a Service is a data management strategy and/or deployment model that focuses on the cloud (public or private) to deliver a variety of data-related services such as storage, processing, and analytics.

Data-at-rest

Data that is stored in a digital form on a physical device, which is not being actively accessed or used.

Data grid

A set of computers that directly interact with each other to coordinate the processing of large jobs.

Data-in-motion

A term that refers to digital data sets that are currently being moved over a network from one location to another.

DAG

A directed acyclic graph is a conceptual representation of a series of activities depicted by a graph.

Data pipeline

A series of actions that ingest data from one or more sources and move it to a destination for storage and analysis.

Data source

Where data originates from — for example, in the case of Hazelcast Flow, the databases, APIs, and messages that fuel it.

Deserialization

The process of reconstructing a data structure or object from a series of bytes or a string in order to instantiate the object for consumption.

DIH

A digital integration hub is a software architecture for a data layer that provides centralized access to a collection of data from disparate sources.

Distributed cache

A system that pools together the random-access memory (RAM) of multiple networked computers into a single in-memory data store used as a data cache to provide fast access to data.

Distributed computing

Sometimes called distributed processing, this is the technique of linking together multiple computer servers over a network into a cluster, to share data and to coordinate processing power.

Distributed hash table

A decentralized data store that looks up data based on key-value pairs.

Distributed transaction

A set of operations on data that is performed across two or more data repositories (especially databases).

EDA

Event-driven architecture is a modern software design approach centered around data that describes events—selection of a button on a user interface (UI), the addition of an item to an online shopping cart, notification of payment on a point of sale (POS) system, etc.—in real-time and enables applications to act on them as they occur.

Edge computing

Sometimes called IoT edge processing, this refers to taking action on data as near to the source as possible rather than in a central, remote data center, to reduce latency and bandwidth use.

Embedded topology

Hazelcast topology where the members are in-process with the user application and act as both client and server.

ESP

Event stream processing is the practice of taking action on a series of data points that originate from a system that continuously creates data. The term “event” refers to each data point in the system, and “stream” refers to the ongoing delivery of those events.

ETL

Extract transform load is a data pipeline pattern for collecting data from various sources, transforming (changing) it to conform to some rules, and loading it into a sink.

Flow

Hazelcast Flow is a data gateway that automates the integration of microservices across an enterprise. Flow accelerates application development by connecting multiple data sources and APIs together, without having to write integration code.

Garbage collection

The recovery of storage that is being used by an application when that application no longer needs the storage. This frees the storage for use by other applications (or processes within an application). It also ensures that an application using increasing amounts of storage does not reach its quota. Programming languages that use garbage collection are often interpreted within virtual machines like the JVM. The environment that runs the code is also responsible for garbage collection.

Grid computing

The practice of leveraging multiple computers, often geographically distributed but connected by networks, to work together to accomplish joint tasks.

Hazelcast cluster

A virtual environment formed by Hazelcast members communicating with each other in the cluster.

Hazelcast partition

Memory segments containing the data. Hazelcast is built on the partition concept, and uses partitions to store and process data. Each partition can have hundreds or thousands of data entries depending on your memory capacity. You can think of a partition as a block of data. In general and optimally, a partition should have a maximum size of 50-100 megabytes.

Hibernate second-level cache

One of the data caching components available in the Hibernate object-relational mapping (ORM) library. Hibernate is a popular ORM library for the Java language, and it lets you store your Java object data in a relational database management system (RDBMS).

IaaS

Infrastructure as a Service is a cloud-based service offering in which the vendor provides compute, network and storage resources and the customer provides the operating system and application software.

IaC

Infrastructure as Code. A method of managing and provisioning IT infrastructure using code, rather than manual processes.

IMDB

In-memory database. A computer system that stores and retrieves data records that reside in a computer’s main memory, e.g., random-access memory (RAM).

IMDG

An in-memory data grid (IMDG) is a data structure that resides entirely in memory and is distributed among many members in a single location or across multiple locations. IMDGs can support thousands of in-memory data updates per second and they can be clustered and scaled in ways that support large quantities of data.

Inference runner

A component in large-scale software systems that lets you plug in machine learning (ML) algorithms (or “models”) to deliver data into those algorithms and calculate outputs.

In-memory computation

Also called in-memory computing, this is the technique of running computer calculations entirely in computer memory (e.g., in RAM).

In-memory processing

The practice of taking action on data entirely in computer memory (e.g., in RAM).

Java heap

The space that Java can reserve and use in memory for dynamic memory allocation. All runtime objects created by a Java application are stored in heap. By default, the heap size is 128 MB, but this limit is reached easily for business applications. Once the heap is full, new objects cannot be created and the Java application shows errors.

Java microservices

A set of software applications written in the Java programming language (and which typically leverage the vast ecosystem of Java tools and frameworks), designed for a limited scope, that work with each other to form a bigger solution.

JCache/Java cache

A de facto standard Java cache API for caching data.

Job

A data pipeline that’s packaged and submitted to a cluster member to run.

JWT

JSON Web Token, an open standard for transmitting information securely between parties as a JSON object.

K8s

Kubernetes. An open source system that manages and deploys containerized applications.

Kappa

The Kappa Architecture is a software architecture used for processing streaming data.

Key-value store

A type of data storage software program that stores data as a set of unique identifiers, each of which have an associated value.

Lambda architecture

A deployment model for data processing that organizations use to combine a traditional batch pipeline with a fast real-time stream pipeline for data access.

Least recently used (LRU)

A cache eviction algorithm where entries are eligible for eviction due to lack of interest by applications.

Least frequently used (LFU)

A cache eviction algorithm where entries are eligible for eviction due to having the lowest usage frequency.

Lite member

A member that does not store data and has no partitions. These members are often used to execute tasks and register listeners.

Machine learning (ML) inference

The process of running live data points into a machine learning algorithm (or “ML model”) to calculate an output such as a single numerical score.

Management Center

A tool for managing and monitoring Hazelcast Platform clusters.

Member

A Hazelcast instance. Depending on your Hazelcast topology, it can refer to a server or a Java virtual machine (JVM). Members belong to a Hazelcast cluster. Members may also be referred as member nodes, cluster members, Hazelcast members, or data members.

Micro-batch processing

The practice of collecting data in small groups (“batches”) for the purposes of taking action on (processing) that data.

Microservices

A set of software applications designed for a limited scope that work with each other to form a bigger solution.

Microservices architecture

A software architecture approach in which a set of software applications designed for a limited scope, known as microservices, work together to form a bigger solution.

mTLS

Mutual authentication. A method that ensures the authenticity of the parties at each end of a network connection.

Multicast

A type of communication in which data is sent to a defined group of destination members simultaneously (one to many). Distinct from unicast (one to one) and broadcast (one to all).

Mutation

In Hazelcast Flow, mutation queries make changes; for example, in performing a task or updating a record.

Near cache

A caching model where an object retrieved from a remote member is put into the local cache and the future requests made to this object will be handled by this local member.

NoSQL

"Not Only SQL". A database model that provides a mechanism for storage and retrieval of data that is tailored in means other than the tabular relations used in relational databases. It is a type of database which does not adhere to the traditional relational database management system (RDMS) structure. It is not built on tables and does not employ SQL to manipulate data. It also may not provide full ACID guarantees, but still has a distributed and fault-tolerant architecture.

OIDC

OpenID Connect provider.

OOME

Out of Memory Error.

Operator

Hazelcast Platform Operator simplifies working with Hazelcast clusters on Kubernetes and Red Hat OpenShift by eliminating the need for manual deployment and life-cycle management.

OSGI

Formerly known as the Open Services Gateway initiative, it describes a modular system and a service platform for the Java programming language that implements a complete and dynamic component model.

PaaS

Platform as a Service is a cloud-based service offering in which the vendor provides hardware resources (as in IaaS), operating systems and management tools, and the customer provides the application software.

Partition table

Table containing all members in the cluster, mappings of partitions to members and further metadata.

PKCE

Proof Key for Code Exchange. An extension used in OAuth 2.0 to improve security for public clients.

Projection

In Hazelcast Flow, projections are a way of taking data from one place, and then transforming and combining it with other data sources.

Publish/subscribe

A software architecture model by which applications create and share data. Pub/sub is particularly popular in serverless and microservices architectures.

Query

A request built using Hazelcast Flow's ability to retrieve and analyze data from different sources across an ecosystem. For example, a query might combine three services together: a database, API and Kafka streaming data.

Race condition

This condition occurs when two or more threads can access shared data and they try to change it at the same time.

Real-time database

A data store designed to collect, process, and/or enrich an incoming series of data points (i.e., a data stream) in real time, typically immediately after the data is created.

Real-time machine learning

The process of training a machine learning model by running live data through it, to continuously improve the model.

Real-time stream processing

The process of taking action on data at the time the data is generated or published.

RSA

An algorithm to generate, encrypt and decrypt keys for secure data transmissions.

SAML

Security Assertion Markup Language identity provider (IdP) authenticates users and passes authentication data to a service provider.

Semantic data type

A method of encoding data that allows software to discover and map data based upon its meaning rather than its structure.

Serialization

A process of converting an object into a stream of bytes in order to store the object or transmit it to memory, a database, or a file. Its main purpose is to save the state of an object in order to be able to recreate it when needed.

Sharding

The practice of optimizing database management systems by separating the rows or columns of a larger database table into multiple smaller tables.

Snapshot

A distributed map that contains the saved state of a job’s computations.

Split-brain

A state in which a cluster of members gets divided (or partitioned) into smaller clusters of members, each of which believes it is the only active cluster.

SSE

Server-sent events.

Streaming data

Also known as real-time data, event data, stream data processing, or data-in-motion, this refers to a continuous flow of information generated by various sources, such as sensors, applications, social media, or other digital platforms.

Streaming database

A data store designed to collect, process, and/or enrich an incoming series of data points (i.e., a data stream) in real time, typically immediately after the data is created.

Streaming ETL (Extract, Transform, Load)

The processing and movement of real-time data from one place to another.

TaxiQL

In Hazelcast Flow,Taxi is a simple query language for describing how data and APIs across an ecosystem relate to one another.

Taxonomy

The practice of classifying and categorizing data.

Time to live (TTL)

A value that determines how long data is retained, before it is discarded from internal cache.

Transaction

A sequence of information exchange and related work (such as data store updating) that is treated as a unit for the purposes of satisfying a request and for ensuring data store integrity.

TSDB

A time-series database is a computer system that is designed to store and retrieve data records that are part of a “time series,” which is a set of data points that are associated with timestamps.

Vector search

An advanced information retrieval method that allows systems to go beyond highly organized, quantitative structured data, and capture the context and semantic meaning of qualitative unstructured data that doesn’t follow conventional models, including multimedia, textual, geospatial, and Internet of Things (IoT) data.

Workspace

A Hazelcast Flow Workspace is a collection of schemas, API specs and Taxi projects that describe data sources and provide a description of the data and capabilities they provide.

WSDL

Web Services Description Language.