This is a prerelease version.

Get Started with SQL Queries

In this tutorial, you learn the basics of querying in SQL by using the interactive SQL shell on your Hazelcast member to query a CSV file.

Before You Begin

To complete this tutorial, you need the following:

Prerequisites Useful resources

A Hazelcast cluster and an instance of Management Center running on your local network

Query Static Data

Using SQL, you can load static data from sources such as files and maps so you can tranform and analyze it.

In this step, you create a static file, load that data into Hazelcast, and query it from an SQL shell.

  1. Create a file named likes.csv.

  2. Add the following data to your file.

    id,name,likes
    1,Jerry,13
    2,Greg,108
    3,Mary,73
    4,Jerry,88

    This example file contains a record of the number of likes each person has.

  3. Connect to the SQL shell on your cluster member.

    • Binary

    • Docker

    Mac and Linux
    bin/hz-cli sql
    Windows
    bin/hz-cli.bat sql

    Replace the $LOCAL_IP placeholder with your member’s local IP address.

    docker run --network hazelcast-network -it --rm hazelcast/hazelcast:5.0-BETA-1 hazelcast --targets hello-world@$LOCAL_IP sql

    The --targets parameter tells the SQL shell to connect to the member at the given IP address in a cluster called hello-world.

    Make sure you mount the likes.csv file on the container that’s running your member.
  4. Use a CREATE MAPPING statement to give Hazelcast access to the data in the likes.csv file.

    CREATE MAPPING csv_likes (id INT, name VARCHAR, likes INT)
    TYPE File
    OPTIONS ('format'='csv',
        'path'='/absolute/path/to/current/folder', 'glob'='likes.csv');
    Make sure you replace the path option with the absolute path to your .csv file.
    The SQL service does not have a native storage system. Instead it uses external mappings to access various resources as if they were tables.
  5. Use a SELECT statement to query all the data in the likes.csv file.

    SELECT * FROM csv_likes;

    You should see the following:

    +------------+--------------------+------------+
    |          id|name                |       likes|
    +------------+--------------------+------------+
    |           1|Jerry               |          13|
    |           2|Greg                |         108|
    |           3|Mary                |          73|
    |           4|Jerry               |          88|
    +------------+--------------------+------------+
  6. Use the SUM() function to aggregate the total number of likes for each person and group the results by name.

    SELECT name, sum(likes) AS total_likes FROM csv_likes GROUP BY name;

    You should see the following:

    +--------------------+--------------------+
    |name                |         total_likes|
    +--------------------+--------------------+
    |Greg                |                 108|
    |Jerry               |                 101|
    |Mary                |                  73|
    +--------------------+--------------------+

    The results do not include a rows for each Jerry because the GROUP BY statement groups the results by name.

  7. Create a mapping to a new table called dislikes.

    CREATE MAPPING dislikes (
    name VARCHAR,
    dislikes INT
    ) TYPE IMap OPTIONS ('keyFormat'='int', 'valueFormat'='json');

    This table is mapped to a distributed map in Hazelcast where the key is an integer and the value is an object that’s serialized to JSON.

  8. Use SINK INTO statements to add some data to the map.

    SINK INTO dislikes VALUES(1, 'Greg', 1);
    SINK INTO dislikes VALUES(2, 'Jerry', 0);
    SINK INTO dislikes VALUES(3, 'Mary', 5);
    SINK INTO dislikes VALUES(4, 'Jerry', 0);
  9. Use the JOIN clause to merge results from the likes and dislikes tables so you can see who has the most likes and dislikes.

    SELECT csv_likes.name, csv_likes.likes, dislikes.dislikes
    FROM csv_likes
    JOIN dislikes
    ON csv_likes.id = dislikes.__key;
    +--------------------+------------+------------+
    |name                |       likes|    dislikes|
    +--------------------+------------+------------+
    |Jerry               |          13|           0|
    |Greg                |         108|           5|
    |Mary                |          73|           5|
    |Jerry               |          88|          20|
    +--------------------+------------+------------+
  10. Use the ORDER BY clause to order the results by name and use the LIMIT clause to limit them so that only the first two are displayed.

SELECT csv_likes.name, csv_likes.likes, dislikes.dislikes
FROM csv_likes
JOIN dislikes
ON csv_likes.id = dislikes.__key
ORDER BY csv_likes.name
LIMIT 2;

Next Steps

Learn how to query maps with SQL.

See more examples of the SELECT statement.