This is a prerelease version.

View latest

Mapping to MongoDB

To query MongoDB data connections, you can create a mapping to them with the Mongo connector.

What is the Mongo Connector

The Mongo connector allows you to read from/write to a MongoDB database, and to execute SQL queries on Mongo collections directly from Hazelcast.

Supported SQL Statements

Installing the Connector

The Mongo Connector artifacts are published on the Maven repositories. Add the following lines to your pom.xml to include it as a dependency to your project:

<dependency>
    <groupId>com.hazelcast.jet</groupId>
    <artifactId>hazelcast-jet-mongodb</artifactId>
    <version>$5.4.0-SNAPSHOT</version>
</dependency>

or if you are using Gradle:

compile group: 'com.hazelcast.jet', name: 'hazelcast-jet-mongodb', version: $5.4.0-SNAPSHOT.
To be able to use SQL over MongoDB, you have to include hazelcast-sql as well as a dependency.

Permissions

Enterprise

If security is enabled, your clients may need permissions to use this connector. For details, see Securing Jobs.

Before you Begin

Before you can create a mapping to a MongoDB, you must have the following:

  • A $jsonSchema validation in the collection (see the schema documentation), or you have to have at least one element in the collection you want to create mapping for (for property type validation).

  • Enabled operations log (oplog, see the Mongo documentation) if you want to use streaming mappings.

Creating a MongoDB Mapping

The following example creates a mapping to a MongoDB database.

  1. In a MongoDB database, create a people collection. For example in Java, you would run the following command.

    CreateCollectionOptions options = new CreateCollectionOptions();
    ValidationOptions validationOptions = new ValidationOptions();
    validationOptions.validator(BsonDocument.parse(
           "{\n" +
                   "    $jsonSchema: {\n" +
                   "      bsonType: \"object\",\n" +
                   "      title: \"Object Validation\",\n" +
                   "      properties: {" +
                   "        \"personId\": { \"bsonType\": \"int\" },\n" +
                   "        \"name\": { \"bsonType\": \"string\" }\n" +
                   "      }\n" +
                   "    }\n" +
                   "  }\n"
    ));
    options.validationOptions(validationOptions);
    database.createCollection(collectionName, options);

    The ValidationOptions are not required, but recommended.

  2. Configure the data connection so that the client can be reused by multiple mappings.

    • XML

    • YAML

    • Java

    <hazelcast>
        <data-connection name="myMongo">
            <type>Mongo</type>
            <properties>
                <property name="connectionString">stringForMongo</property> (1)
            </properties>
            <shared>false</shared> (2)
        </data-connection>
    </hazelcast>
    data-connection:
      name: myMongo
      type: Mongo
      properties:
        connectionString: stringforMongo (1)
      shared: false (2)
    DataConnectionConfig dataConnectionConfig = new DataConnectionConfig()
            .setName("myMongo")
            .setType("Mongo")
            .setProperty("connectionString", connectionStringToMongo) (1)
            .setShared(false); (2)
    config.addDataConnectionConfig(dataConnectionConfig);
    1 Your connection string.
    2 Set to true if the connection is reusable.

    Instead of providing a single connectionString parameter, you may also want to provide host, username, password and (optionally) authDb.

    Instead of providing the configuration in YAML or XML, you can also run the following SQL query.

    CREATE DATA CONNECTION myMongo type Mongo SHARED
    OPTIONS (
    	‘connectionString’ = ‘<your connection string>’
    )
  3. Create the mapping.

    CREATE MAPPING people
    DATA CONNECTION myMongo; (1)
    1 The name of the data connection configuration on your members (see Step 2 above).

    In the above case, automatic schema inference will be used. You may also want to provide the schema explicitly as shown below.

    CREATE MAPPING people (
        firstName VARCHAR(100),
        lastName VARCHAR(100),
        age INT
    )
    DATA CONNECTION myMongo

    Notice that there is no mention of TYPE MONGO this time; it’s automatically assumed by the SQL engine when you provide MongoDB data connection. This works with both schema provided or not.

Specify the database name using one of the following options:

  • Add OPTIONS ('database' = 'myDatabase') in CREATE DATA CONNECTION

  • Add OPTIONS ('database' = 'myDatabase') in CREATE MAPPING

  • Use the full external name, e.g., CREATE MAPPING people EXTERNAL NAME myDatabase.people (…​)

Supported Object Types

There may be one or more object types for a connector. The Mongo SQL connector has two object types:

  • Collection: Represents a batch read from a given collection. This is the default object type.

  • ChangeStream: Represents reading a stream of events.

To change the object type, append OBJECT TYPE X after DATA CONNECTION / TYPE. For example:

  • Using Data Connection

  • Using TYPE

CREATE MAPPING people (
    firstName VARCHAR(100),
    lastName VARCHAR(100),
    age INT
)
DATA CONNECTION myMongo
OBJECT TYPE ChangeStream
OPTIONS (...)
CREATE MAPPING people (
firstName VARCHAR(100),
lastName VARCHAR(100),
age INT
)
TYPE Mongo
OBJECT TYPE ChangeStream
OPTIONS (...)

Field Mappings

Object type Collection resolves columns to the names and types of the properties of a Mongo collection. For ChangeStream it is more complicated. The ChangeStream object contains the MongoDB collection properties, prefixed with fullDocument.. There are also some predefined, top-level columns without the fullDocument. prefix:

  • operationType STRING: Operation that triggered this change event, e.g. insert.

  • resumeToken STRING: Resume token associated with the given change stream event.

  • wallTime DATE_TIME: Wall time of the event. The date and time of the database operation on the server.

  • ts TIMESTAMP: Timestamp of the event. This is either equal to the wall time of the event, if provided, or to the current time on the Hazelcast member.

  • clusterTime TIMESTAMP: Cluster time of the event. The timestamp associated with Mongo oplog entry.

For further information, see the Mongo documentation.

Available options

Best way is to configure MongoDB options via Data Connection. Options available when using Mongo Data Connections are:

There are some options that can be added only in CREATE MAPPING:

Name

Description

startAt

Defines moment in time from when the event stream should be read. The option is valid only if mapping has ChangeStream object type. This property should have value of:

- now - time in epoch milliseconds or - ISO-formatted instant in UTC timezone, like 2023-03-24T15:31:00Z

idColumn

Specifies which column should be used as a primary key in the queries. Default value is _id. Remember that idColumn should have index in Mongo.

forceReadTotalParallelismOne

Forces queries to be executed exactly on one member in one thread. Can be useful if you want to use, e.g., MongoDB Atlas serverless free tier, which currently does not support $function aggregates.

checkExistence

Configures when Hazelcast will perform database and collection existence checks to warn about non existing database and/or collection.

Possible values:

- only-initial - checks will be done only when the mapping is created. Recommended if you have many short-living queries (e.g. when mapping is used solely by GenericMapStore). - once-per-job - checks will be done as above and also during every query execution. This is the default value. - on-each-connect - similar to above, but in case of reconnection (e.g. restore after failure) it will perform the checks again. - never - means that checks won’t be performed at all.

Type Mapping

The type system in MongoDB and SQL is not exactly the same. This leads to potential confusion and the need for the type coercion.

Table 1. MongoDB Type Conversion
BSON Type SQL Type Java Type

DOUBLE

DOUBLE

DOUBLE

STRING

VARCHAR

STRING

OBJECT

OBJECT

org.bson.Document

ARRAY

OBJECT

LIST

BINDATA

-

-

UNDEFINED

-

-

OBJECTID

OBJECT

org.bson.ObjectId

BOOL

BOOLEAN

BOOLEAN

DATE

This represents seconds from Unix epoch in UTC timezone. Therefore, it’s not mapped to pure DATE SQL type nor LOCALDATE in Java (nor any formats with timezones).

DATE_TIME or TIMESTAMP

LOCALDATETIME

TIMESTAMP

DATE_TIME or TIMESTAMP

LOCALDATETIME

NULL

-

-

REGEX

OBJECT

org.bson.BsonRegularExpression

DBPOINTER

-

-

JAVASCRIPT

VARCHAR

STRING

JAVASCRIPTWITHSCOPE

OBJECT

org.bson.CodeWithScope

SYMBOL

-

-

INT (32 BIT)

INT

INT

LONG (64 BIT)

BIGINT

LONG

DECIMAL (128 BIT)

DECIMAL

BIGDECIMAL

MINKEY

OBJECT

org.bson.MinKey

MAXKEY

OBJECT

org.bson.MaxKey

The Java Type column represents an object returned by the SQL query if the object put into the collection is of given BSON type.

Note that, while Hazelcast is able to convert MongoDB type to the requested SQL type in the projection, the argument binding will not always work the same due to technical limitations. For example, you can have an object with the type TIMESTAMP represented as DATE_TIME, that after execution of SELECT it will give you LocalDateTime in Java client. However, binding LocalDateTime as an argument will not work, as only native MongoDB types will work for arguments. Same applies to, for example, having BSON column of type STRING mapped to INTEGER in SQL.

Type Coercion

The following table shows the possible and supported type coercions. All the default mappings from the previous section are always valid.

Table 2. MongoDB Type Conversion
Type of Provided Argument Resolved Insertion Type

LOCALDATETIME

BSONDATETIME

OFFSETDATETIME

BSONDATETIME

HazelcastJsonValue (JSON column)

DOCUMENT