Compact Serialization Binary Specification

The Compact Serialization Binary Specification defines the binary format of the Compact Serialization, the supported types, and the Compact Schema.

The supported data types include:

  • Fixed-size types like boolean, int8, int16, int32, int64, float32, and float64

  • Nullable versions of the fixed-size types

  • Variable-size types like string, decimal, time, date, timestamp, and timestamp with timezone

  • Arrays of the types listed above, and

  • Nested Compact serializable objects containing the types listed above, and arrays of them.

Note: Except fixed-size types, all other types are considered variable-size.

See Compact Serialization on implementing and using this Hazelcast-specific serialization option.

Introduction

Every serialized user object consists of a header and data. Compact Serialization includes a schema that describes the serialized data and a schema ID, represented by an 8-byte long fingerprint, that uniquely identifies the schema. The schema and the data are separate entities that are connected by the schema ID.

Data Types

The table below shows the data types that are supported by the Compact Serialization.

All primitive types, user types, array size, number of items, length of data, schema ID, and offsets are configured in Compact Serialization with a default endianness of BIG_ENDIAN.

You cannot redefine the endianness in Compact Serialization; it is inherited from existing configuration.

Type Java C++ C# Python Node.js Go SQL Description Fixed Size

boolean

boolean

bool

bool

bool

boolean

bool

BOOLEAN

true or false represented by 1 bit as either 1 or 0. Up to 8 booleans packed into a single byte

yes

int8

byte

int8_t

sbyte

int

number

int8

TINYINT

8 bit two’s complement signed integer

yes

int16

short

int16_t

short

int

number

int16

SMALLINT

16-bit two’s-complement signed integer

yes

int32

int

int32_t

int

int

number

int32

INTEGER

32-bit two’s-complement signed integer

yes

int64

long

int64_t

long

int

Long

int64

BIGINT

64-bit two’s-complement signed integer

yes

float32

float

float

float

float

number

float32

REAL

32-bit IEEE 754 floating-point number

yes

float64

double

double

double

float

number

float64

DOUBLE

64-bit IEEE 754 floating-point number

yes

string

String

optional<std::string>

string?

Optional[str]

string | null

*string

STRING

null or number of bytes in the string(int32) + UTF-8 string https://tools.ietf.org/html/rfc3629

no

decimal

BigDecimal

optional<hazelcast:client:decimal>

HBigDecimal?

Optional[decimal.Decimal]

BigDecimal | null

*types.Decimal

DECIMAL

null or
Arbitrary precision and scale floating-point number:
represented as unscaledValue x 10 ^ -scale
unscaledValue: Array of int8 (byte array containing the two’s-complement binary
representation in big-endian byte-order: the most significant byte is in the zeroth element.)
scale : single int32 for scale

no

time

LocalTime

optional<hazelcast:client:local_time>

HLocalTime?

Optional[datetime.time]

LocalTime | null

*types.LocalTime

TIME

null or
HH-MI-SS-NN
int8: hour
int8: minute
int8: seconds
int32: nanoseconds

no(since it is nullable)

date

LocalDate

optional<hazelcast:client:local_date>

HLocalDate?

Optional[datetime.date]

LocalDate | null

*types.LocalDate

DATE

null or
YYYY-MM-DD from -999999999-01-1 to 999999999-12-31
int32: year
int8: month
int8: dayOfMonth

no(since it is nullable)

timestamp

LocalDateTime

optional<hazelcast:client:local_date_time>

HLocalDateTime?

Optional[datetime.datetime]

LocalDateTime | null

*types.LocalDateTime

TIMESTAMP

null or
YYYY-MM-DD-HH-MI-SS-NN
int32: year
int8: month
int8: dayOfMonth
int8 : hour
int8: minute
int8: seconds
int32: nanoseconds

no(since it is nullable)

timestampWithTimeZone

OffsetDateTime

optional<hazelcast:client:offset_date_time>

HOffsetDateTime?

Optional[datetime.datetime]

OffsetDateTime | null

*types.OffsetDateTime

TIMESTAMP W/ TZ

null or
YYYY-MM-DD-HH-MI-SS-MM Zone
int32: year
int8: month
int8:dayOfMonth
int8 : hour
int8: minute
int8: seconds
int32: nanoseconds
int32 : offsetSeconds.
offsetSeconds is range between +/-18:00:00 hour

no(since it is nullable)

compact

T

template<typename T> optional<T>

T?

Optional[Any]

T | null

interface{}

OBJECT

A user defined compact

no

boolean[]

boolean[]

optional<std::vector<bool>>

bool[]?

Optional[list[bool]]

boolean[] | null

[]bool

Array of booleans

no

int8[]

byte[]

optional<std::vector<int8_t>>

sbyte[]?

Optional[list[int]]

Buffer | null

[]int8

Array of int8s

no

int16[]

short[]

optional<std::vector<int16_t>>

short[]?

Optional[list[int]]

number[] | null

[]int16

Array of int16s

no

int32[]

int[]

optional<std::vector<int32_t>>

int[]?

Optional[list[int]]

number[] | null

[]int32

Array of int32s

no

int64[]

long[]

optional<std::vector<int64_t>>

long[]?

Optional[list[int]]

Long[] | null

[]int64

Array of int64s

no

float32[]

float[]

optional<std::vector<float>>

float[]?

Optional[list[float]]

number[] | null

[]float32

Array of float32s

no

float64[]

double[]

optional<std::vector<double>>

double[]?

Optional[list[float]]

number[] | null

[]float64

Array of float64s

no

string[]

String[]

optional<std::vector<optional<std::string>>>

string?[]?

Optional[list[Optional[str]]]

(string | null)[] | null

[]*string

Array of strings

no

decimal[]

BigDecimal[]

optional<std::vector<optional<decimal>>>

HBigDecimal?[]?

Optional[list[Optional[decimal.Decimal]]]

(BigDecimal | null)[] | null

[]*types.Decimal

Array of Decimals

no

time[]

LocalTime[]

optional<std::vector<optional<hazelcast:client:local_time>>>

HLocalTime?[]?

Optional[list[Optional[datetime.time]]]

(LocalTime | null)[] | null

[]*types.LocalTime

Array of Times

no

date[]

LocalDate[]

optional<std::vector<optional<hazelcast:client:local_date>>>

HLocalDate?[]?

Optional[list[Optional[datetime.date]]]

(LocalDate | null)[] | null

[]*types.LocalDate

Array of Dates

no

timestamp[]

LocalDateTime[]

optional<std::vector<optional<hazelcast:client:local_date_time>>>

HLocalDateTime?[]?

Optional[list[Optional[datetime.datetime]]]

(LocalDateTime | null)[] | null

[]*types.LocalDateTime

Array of Timestamps

no

timestampWithTimeZone[]

OffsetDateTime[]

optional<std::vector<optional<hazelcast:client:offset_date_time>>>

HOffsetDateTime?[]?

Optional[list[Optional[datetime.datetime]]]

(OffsetDateTime | null)[] | null

[]*types.OffsetDateTime

Array of TimestampWithTimeZones

no

compact[]

T[]

template<typename T> optional<std::vector<optional<T>>>

T?[]?

Optional[list[Optional[Any]]]

(T | null)[] | null

[]interface{}

Array of compacts

no

nullable-boolean

Boolean

optional<bool>

bool?

Optional[bool]

boolean | null

*bool

null or int8 1 for true int8 0 for false

no

nullable-int8

Byte

optional<int8_t>

sbyte?

Optional[int]

number | null

*int8

An int8 that can also be null

no

nullable-int16

Short

optional<int16_t>

short?

Optional[int]

number | null

*int16

An int16 that can also be null

no

nullable-int32

Integer

optional<int32_t>

int?

Optional[int]

number | null

*int32

An int32 that can also be null

no

nullable-int64

Long

optional<int64_t>

long?

Optional[int]

Long | null

*int64

An int64 that can also be null

no

nullable-float32

Float

optional<float>

float?

Optional[float]

number | null

*float32

A float32 that can also be null

no

nullable-float64

Double

optional<double>

double?

Optional[float]

number | null

*float64

A double that can also be null

no

nullable-boolean[]

Boolean[]

optional<std::vector<optional<bool>>>

bool?[]?

Optional[list[Optional[bool]]]

(boolean | null)[] | null

[]*bool

Array of nullable booleans

no

nullable-int8[]

Byte[]

optional<std::vector<optional<int8_t>>>

sbyte?[]?

Optional[list[Optional[int]]]

(number | null)[] | null

[]*int8

Array of nullable int8s

no

nullable-int16[]

Short[]

optional<std::vector<optional<int16_t>>>

short?[]?

Optional[list[Optional[int]]]

(number | null)[] | null

[]*int16

Array of nullable i1int6s

no

nullable-int32[]

Integer[]

optional<std::vector<optional<int32_t>>>

int?[]?

Optional[list[Optional[int]]]

(number | null)[] | null

[]*int32

Array of nullable int32s

no

nullable-int64[]

Long[]

optional<std::vector<optional<int64_t>>>

long?[]?

Optional[list[Optional[int]]]

(Long | null)[] | null

[]*int64

Array of nullable int64s

no

nullable-float32[]

Float[]

optional<std::vector<optional<float>>>

float?[]?

Optional[list[Optional[float]]]

(number | null)[] | null

[]*float32

Array of nullable float32s

no

nullable-float64[]

Double[]

optional<std::vector<optional<double>>>

double?[]?

Optional[list[Optional[float]]]

(number | null)[] | null

[]*float64

Array of nullable float64

no

Type IDs

Each type supported in the wire format has its type ID. The type IDs are used while constructing the schemas, performing type checks when accessing fields, and are exposed in a public API.

Nullable Primitives

Nullable primitives are implemented as variable-sized types. The null values of nullable primitives are represented like null variable-sized fields, with the offset of -1 and no data.

The partition hash and the type ID are common for all serialization methods supported by Hazelcast, including Compact Serialization. Every serialized object has a header and the payload on the wire.

Name

Type

Description

Partition hash

i32

BIG_ENDIAN integer, used for key objects. Not applicable to value objects.

Type ID

i32

BIG_ENDIAN integer that determines the serializer to be used. -55 for compact.

Var-size Objects

Var-size objects are user-defined objects whose binary representation consists of Header, Data, and Offsets sections, given in this order.

Header

Name

Type

Description

Schema ID

i64

Schema Hash.

Data length

i32

Length of the Data Section.

Data

Name

Description

Fixed-size Fields

Fixed-size field offsets are deduced from the schema.

Variable-size Fields

Offsets

Name

Type

Description

Variable-Size FieldOffset index 0

u8/u16/i32

The index of a field offset is written in the schema. Offsets of variable length fields. -1 for null

Variable-Size FieldOffset index 1

u8/u16/i32

The index of a field offset is written in the schema. Offsets of variable length fields. -1 for null

…​

…​

Variable-Size FieldOffset index n

u8/u16/i32

The index of a field offset is written in the schema. Offsets of variable length fields. -1 for null

Note: If the composed data does not include any variable-size field in the schema, Variable-Size FieldOffset and DataLength will not exist on the wire. Similarly, if there is no fixed-size field in the schema, Fixed-Size Fields will not exist on the wire.

Variable-Size FieldOffset`s are calculated from the beginning of the `DATA SECTION shown in the table above.

Variable-Size FieldOffset sizes vary depending on the Data Length.

  • Data Length <= 254, offsets are u8 (255 is reserved for null)

  • Data Length <= 65534, offsets are u16 (65535 is reserved for null)

  • Otherwise, offsets are i32.

Length is written before offsets so that the binary can be skipped even when the schema cannot be found.

A Variable-Size FieldOffset is -1 if a Variable-Size field is null.

Fixed-Size Fields cannot be null.

Fixed-Size Fields

The fixed-size fields are written after the Length field, consecutively. Fixed-size fields are accessed via their offsets that are calculated from the schema.

On the schema, the offset for a fixed-size field is determined as follows:

  • The first field always starts from offset 0.

  • Fields are ordered by their size in descending order.

  • When field sizes are the same, fields are ordered by the field name.

  • Each offset is calculated by adding the size of the last field to the last offset.

The only exception to the above rule is boolean fields. The size of a boolean is a byte, but 8 booleans can be packed into a single byte and these booleans can share the same offset and byte. To achieve that, extra information is stored in the schema (nothing extra on the data) for the bit index of the boolean fields. Boolean fields are written at the end of the fixed-size fields.

Variable-size Fields

The offsets of variable-size fields are written at the end in the alphabetical order of the field names. Each variable-size field offset has an index, starting from 0, written in the schema. To read a variable-size field from the data, one should read the index of the offset from the schema. Then, the offset associated with the related index is read from the end of the data. The variable-size field can be read using this offset.

On the schema, the index for a variable-size field is determined as follows:

  • The fields are given the index incrementally, according to the order of the field names, starting from 0.

Based on the length of the serialized data, the offsets of the variable-size fields might be represented by 1, 2, or 4 bytes. That is, all variable-size field offsets are either 1, 2, or 4 bytes per serialized object, depending on the field size.

Schema

Name

Type

type name

string

number of fields

i32

name of field 0

string

type ID of field 0

i32

name of field 1

string

type ID of field 1

i32

…​

…​

name of field n

string

type ID of field n

i32

A schema keeps the type name of the Compact type and some fields. A schema represents the structure of a Compact serialized data and allows reading the data. The type name is used to differentiate the user types which have the same number of fields with the same types.

Only schema ID is written to serialized data, however, schemas need to be exchanged between a client and a member to be able to serialize that data. A schema orders its fields by field name during initialization, so that the schema in both the client and the member produces the same schema ID.

During initialization, field offsets and indexes are assigned as well. Offsets are pointers to the locations of the fields in the serialized data. Indexes are for enumerating variable-size fields' offsets. Variable-size fields' offsets are written in the serialized data. For details on how fixed-size field offsets are assigned, see Fixed-Size Fields. For details on how variable-size field indexes and offsets are assigned, see Variable-size Fields.

In the schema class, each field will either:

  • have a positive offset, if it is a fixed-size field

  • have a positive index if it is a variable-size field

  • have a positive bit offset if it is a boolean field, which is the offset within the byte given by the normal offset.

Schema ID

We are using 64bit Rabin fingerprint to create a schema ID.

Rabin fingerprint is chosen mostly because it is recommended in Avro’s documentation as follows.

At the opposite extreme, the smallest fingerprint recommended is a 64-bit Rabin fingerprint. Below, there is a provided pseudo-code for this algorithm that can be easily translated into any programming language. 64-bit fingerprints should guarantee uniqueness for schema caches of up to a million entries (for such a cache, the chance of a collision is 3E-8). It is not recommended to use shorter fingerprints, as the chances of collisions are too high (for example, with 32-bit fingerprints, a cache with as few as 100,000 schemas has a 50% chance of having a collision).

According to the quote, even with a schema cache with a million entries, the chance of a collision is very low. Therefore, there should not be a need to the change number of bits of the hashing algorithm soon.

The schema ID is calculated from the byte array representation of the schema described above.

The implementation is as follows:

long fingerprint64(byte[] buf) {
  if (FP_TABLE == null) initFPTable();
  long fp = EMPTY;
  for (int i = 0; i < buf.length; i++)
    fp = (fp >>> 8) ^ FP_TABLE[(int)(fp ^ buf[i]) & 0xff];
  return fp;
}

static long EMPTY = 0xc15d213aa4d7a795L;
static long[] FP_TABLE = null;

void initFPTable() {
  FP_TABLE = new long[256];
  for (int i = 0; i < 256; i++) {
    long fp = i;
    for (int j = 0; j < 8; j++)
      fp = (fp >>> 1) ^ (EMPTY & -(fp & 1L));
    FP_TABLE[i] = fp;
  }
}

Arrays

Arrays of fixed-size items can not have null items. On the other hand, arrays of variable-size items may contain null items.

Array of Fixed-size Items

Name Type

Number of items

i32

item 0

item type

item 1

item type

item 2

item type

item n

item type

Array of Variable-size Items

Consists of Header, Data, and Offsets sections in this order.

Header

Name

Type

Data length

i32

Number of items

i32

Data

Name

Type

Item 0

item type

Item 1

item type

…​

…​

Item n

item type

Offsets

Name

Type

Item 0 offset

u8/u16/i32

Item 1 offset

u8/u16/i32

…​

…​

Item n offset

u8/u16/i32


An array can contain only a single type of item. In the case of Compact[], all the items must have the same schema, that is, their schema ID must be equal.

Offsets are calculated from the beginning of the Data section.

Data Length is the length of the Data section.

Offset sizes vary depending on the Data Length.

  • For Data Length <= 254, offsets are u8 (255 is reserved for null)

  • For 255 < Data Length <= 65534, offsets are u16 (65535 is reserved for null)

  • For Data Length > 65535, offsets are i32.

Variable-size items can be null. The corresponding offset will be set to -1 in that case.

Nullable Values

Nullable fields can be null. The null values are represented with -1 offset in the binary, and no more data is written. On the other hand, non-nullable fields always take up space in the binary.

Fixed-size fields are non-nullable. Variable-size fields are nullable.