Hazelcast JSON SQL JOIN Query with Example: A Comprehensive Guide

1. Introduction

In today’s time, efficient querying and management of complex data structures is a necessity for data-driven applications. Hazelcast, a powerful in-memory data grid, is known for handling large-scale distributed data and provides robust querying features. One unique feature of Hazelcast JSON SQL JOIN Query is HazelcastJsonValue, which natively supports JSON data.

SQL (Structured Query Language) is a popular language that has been used for decades to query and manipulate data. Hazelcast takes this a step further by supporting SQL queries on distributed data structures such as maps, which allows developers to reap the benefits of distributed systems while still using familiar syntax.

SQL join is once One very powerful feature of SQL is join operations, which are used to combine data from multiple sources, based on related keys. This feature is very useful in scenarios where we need to link users’ profiles with their transaction records, associate employees with their departments, or aggregate data from different systems.

In Hazelcast JSON SQL JOIN Query article, we will see how SQL join queries can be implemented with JSON data in Hazelcast. We will learn how to store JSON data in Hazelcast maps, how to execute SQL join queries, and how to retrieve meaningful results. With a practical example, you will understand how simple and efficient Hazelcast makes handling distributed JSON data and harnessing the power of SQL.

Hazelcast JSON SQL JOIN Query with Example: A Comprehensive Guide

2. Understanding Hazelcast’s JSON Support

In Hazelcast JSON SQL JOIN Query, to handle the JSON data efficiently Hazelcast provides HazelcastJsonValue. The main purpose of HazelcastJsonValue is to store, query and process the JSON data without any transformation on the JSON data structure. If your application process and works on JSON data, HazelcastJsonValue class is very useful.

HazelcastJsonValue is a special data type that stores and manages JSON strings as-is. It does not treat JSON objects like plain text, but allows you to query and manipulate the attributes inside them.

`HazelcastJsonValue` Features

Direct JSON Handling: JSON strings can be stored directly without any conversion. The format and structure of the data remains completely intact.
Field-wise Querying: Specific fields of JSON can be queried through predicates or SQL queries.

3. What are Hazelcast SQL Joins?

In Hazelcast JSON SQL JOIN Query, using Hazelcast’s SQL engine, you can join two or more maps, just like you do in relational databases. Joins allow you to efficiently fetch and process related data.

Example:
Imagine you have two maps:

studentMap map, which contains employee details (like ID, name, and department ID).
departmentMap map, which contains department details (like ID and department name).

You can write a join query and fetch studentMap and their departmentMap names simultaneously.

Types of Joins Supported in Hazelcast:

INNER JOIN: This join fetches only those rows where there are matching records in both maps.
LEFT OUTER JOIN: This join fetches all rows from the left map (like studentMap) irrespective of whether there is a matching record in the right map (like departmentMap) or not.

4. Setting Up Hazelcast

Pre-requisite:

Java 8 or higher
Good understanding about Hazelcast Maps
Basics understanding of Jet engine to execute SQL queries

Hazelcast dependencies:

Add the following dependency to your pom.xml, hazelcast-sql dependency used for Query operations.

if you are using Maven:

<dependency>
    <groupId>com.hazelcast</groupId>
    <artifactId>hazelcast</artifactId>
    <version>5.5.0</version> <!-- Use the latest stable version -->
</dependency>
  
<dependency>
    <groupId>com.hazelcast</groupId>
    <artifactId>hazelcast-sql</artifactId>
    <version>5.5.0</version>
    <scope>runtime</scope>
</dependency>

In your development of Hazelcast SQL Query Example, If you’re using Gradle, include:

implementation 'com.hazelcast:hazelcast:5.5.0'
implementation 'com.hazelcast:hazelcast-sql:5.5.0'

Also explore the below articles:

5. Creating and Populating Maps

Hazelcast JSON SQL JOIN Query, create sample JSON data map studentMap, departmentMap and courseMap, which we have used for sample SQL demo code.

studentMap: Key Integer and accept value as HazelcastJsonValue class type
deparmentMap: Key Integer and accept value as HazelcastJsonValue class type, where departmentId used as a foreign key into studentMap
courseMap: Key Integer and accept value as HazelcastJsonValue class type, where List<Course> used as key into studentMap

Create a Hazelcast’s instance and enable Jet engine config, Jet engine will help us to execute the SQL queries

// Initialize Hazelcast instance
Config config = new Config();
config.getJetConfig().setEnabled(true); // Enable the Jet engine
HazelcastInstance hazelcastInstance = Hazelcast.newHazelcastInstance(config);

Create sample maps studentMap, deparmentMap, and courseMap used in the demo.

// Define maps
IMap<Integer, HazelcastJsonValue> studentMap = hazelcastInstance.getMap("studentMap");
IMap<Integer, HazelcastJsonValue> departmentMap = hazelcastInstance.getMap("departmentMap");
IMap<Integer, HazelcastJsonValue> courseMap = hazelcastInstance.getMap("courseMap");

Populate the JSON data into the maps.

Course course1 = new Course(501, "Physics");
Course course2 = new Course(502, "Chemistry");
Course course3 = new Course(501, "Maths");
courseMap.put(501, new HazelcastJsonValue(gson.toJson(course1)));
courseMap.put(502, new HazelcastJsonValue(gson.toJson(course2)));
courseMap.put(503, new HazelcastJsonValue(gson.toJson(course3)));

Student student1 = new Student(101, "Sachin", 1001, List.of(course1, course3));
Student student2 = new Student(102, "Rahul", 1002, List.of(course2));
Student student3 = new Student(103, "Rachel", 1001, List.of(course3));
Student student4 = new Student(104, "Villa", 1001, List.of(course1));
Student student5 = new Student(105, "Nev am", 1002, List.of(course2));
studentMap.put(101, new HazelcastJsonValue(gson.toJson(student1)));
studentMap.put(103, new HazelcastJsonValue(gson.toJson(student3)));
studentMap.put(102, new HazelcastJsonValue(gson.toJson(student2)));
studentMap.put(105, new HazelcastJsonValue(gson.toJson(student5)));
studentMap.put(104, new HazelcastJsonValue(gson.toJson(student4)));

Department department1 = new Department(1001, "Physics Department");
Department department2 = new Department(1002, "Chemistry Department");
departmentMap.put(1001, new HazelcastJsonValue(gson.toJson(department1)));
departmentMap.put(1002, new HazelcastJsonValue(gson.toJson(department2)));

Sample Student, Department, and Course POJO.

public class Student {
    private int studentId;
    private String studentName;

    private int departmentId;
    private List<Course> courses;

    //create setter and getter or use lombok lib
}

public class Course {
    private int courseId;
    private String courseName;

    //create setter and getter or use lombok lib
}

public class Department {
    private int id;
    private String departmentName;

    //create setter and getter or use lombok lib
}

6. SQL Data Type Mapping

Before starting Hazelcast JSON SQL JOIN query execution, create data type mappings for sudentMap, departmentMap, and courseMap. SqlService allows to apply the mapping and execute the queries.

SqlService sqlService = hazelcastInstance.getSql();

//Then we define the schema in Hazelcast
String studentMapping = "CREATE MAPPING studentMap " +
		"TYPE IMap " +
		"OPTIONS (" +
		"'keyFormat' = 'java', " +
		"'keyJavaClass' = 'java.lang.Integer', " +
		"'valueFormat' = 'java', " +
		"'valueJavaClass' = 'com.hazelcast.core.HazelcastJsonValue'" +
		")";

sqlService.execute(studentMapping);

String departmentMapping = "CREATE MAPPING departmentMap " +
		"TYPE IMap " +
		"OPTIONS (" +
		"'keyFormat' = 'java', " +
		"'keyJavaClass' = 'java.lang.Integer', " +
		"'valueFormat' = 'java', " +
		"'valueJavaClass' = 'com.hazelcast.core.HazelcastJsonValue'" +
		")";

sqlService.execute(departmentMapping);

String courseMapping = "CREATE MAPPING courseMap " +
		"TYPE IMap " +
		"OPTIONS (" +
		"'keyFormat' = 'java', " +
		"'keyJavaClass' = 'java.lang.Integer', " +
		"'valueFormat' = 'java', " +
		"'valueJavaClass' = 'com.hazelcast.core.HazelcastJsonValue'" +
		")";

sqlService.execute(courseMapping);

7. Performing an SQL Join Query on JSON

SQL join query is similar to a relational database query, in HazelcastJsonValue query the data with JSON path by using of JSON_QUERY. You can use the $. attribute name to locate the specific JSON attribute into JSON data.

//create query to select specific JSON attribute from json map
String sqlQuery = "SELECT JSON_QUERY(a1.this, '$.studentId') AS studentId, " +
		"JSON_QUERY(a1.this, '$.studentName') AS studentName, " +
		"JSON_QUERY(a2.this, '$.id') AS departmentId, " +
		"JSON_QUERY(a2.this, '$.departmentName') as departmentName " +
		"FROM studentMap a1 JOIN departmentMap a2  " +
		"ON JSON_QUERY(a1.this, '$.departmentId') = JSON_QUERY(a2.this, '$.id')";

sqlService.execute(sqlQuery) responsible for query execution, which returns the SqlResult as response.

SqlResult result = sqlService.execute(sqlQuery);
for (SqlRow row : result) {
    //Sample result printing for demo
    System.out.print("   "+row.getObject("studentId"));
    System.out.print("   "+row.getObject("studentName"));
    System.out.print("   "+row.getObject("departmentId"));
    System.out.print("   "+row.getObject("departmentName"));
    System.out.println();
}

SqlResult as the result set which holds the data after query execution
SqlRow represents the collections of rows or the result set.
row.getObject(int), get the result value by specific column order
row.getObject("column name"), get the value by specific column name

8. Best Practices and Limitations

Best Practices

Use Indexes for Performance: Hazelcast SQL queries can be slow on large datasets if indexes are not used. Put indexes on fields that are frequently used in joins or filters. Example:

map.addIndex(IndexType.SORTED, "departmentId");

Optimize the query: By fetching only required fields to reduce network and processing overhead. Avoid using SELECT * unless necessary.
Monitor Query Performance: Optimize slow-running queries by using Hazelcast Management Center or query execution metrics.
Balance the distributed environment: Cluster should be properly balanced so that data remains evenly distributed and query execution is fast.
Keep Consistent Schema Design: Hazelcast is schema-less, but your data models must be logically aligned for joins to work properly. Use consistent key names for related fields.
Use Ad-Hoc Queries Carefully: Ad-hoc SQL Use queries only for exploratory analysis. Predefine frequent queries for better performance and stability.

Limitations

Only IMap is supported: Hazelcast SQL joins are currently available only for IMap data structures. Compatible with Multi-map or replicated map no.
High Memory Usage: During Joins execution, data is loaded into memory. If datasets are very large, then high memory usage or out-of-memory errors can occur.
Distributed Execution Latency: Distributed processing is a strength, but if the cluster configuration is not good or the data is unevenly distributed, latency can occur.
Complex Joins Can Be Slow: Complex queries with multiple joins can run slower than traditional databases especially with large datasets.
Lack of Advanced Join Types: Hazelcast does not yet support FULL OUTER JOIN or CROSS JOIN types.
Debugging Distributed Queries: Debugging Distributed SQL queries can be challenging, especially in large clusters. Logs and metrics It has to be analyzed carefully.

9. Conclusion

Hazelcast’s JSON SQL JOIN feature is very useful for those scenarios where relational-style queries are required but the data is stored in a distributed environment. By implementing the examples in this blog, you can practically understand the JSON SQL capabilities of Hazelcast.

If you have data in JSON format and want to execute relational-style queries, then Hazelcast’s JSON SQL query feature can be very helpful.

The Java sample program used on the demo are available over the GitHub.

Happy learning 😊