Hazelcast Vector Collections Explained in Detail: A Learning Guide

Overview

In contemporary AI and ML environments, vectors are the primary way to encode and search complex, high-dimensional data such as text, images, audio, and user preferences. Starting with Hazelcast 5.4, Vector Collections let developers implement semantic search, recommendation engines, and other AI features directly inside a distributed in-memory data grid, removing the need for a separate vector database.

This article provides a thorough look at Hazelcast Vector Collections: what they do, how they operate, the available configuration options and similarity metrics, and a complete Java example to demonstrate how to add vector search to your applications.

Table of Contents

  1. Introduction to Hazelcast Vector Collections
  2. What is a Vector Collection?
  3. Why Use Vector Collections in Hazelcast?
  4. Understanding Vectors and Embeddings
  5. Hazelcast VectorIndexConfig Explained
  6. Supported Similarity Metrics (Cosine, Euclidean, Dot)
  7. Indexing and Querying Vectors
  8. Hazelcast Vector Search Architecture
  9. Step-by-Step: Creating a Vector Collection in Hazelcast
  10. Complete Java Example: Hazelcast Vector Search Implementation
  11. Real-World Use Cases of Hazelcast Vector Collections
  12. Advantages of Hazelcast Vector Search
  13. Limitations and Best Practices
  14. Future of Vector Search in Hazelcast
  15. Conclusion

1. Introduction to Hazelcast Vector Collections

Hazelcast, a distributed in-memory data grid (IMDG), provides fast caching, data distribution, and stream processing. As AI applications proliferate, data increasingly carries context and meaning rather than just simple key-value pairs.

To address this shift, Hazelcast added Vector Collections a native mechanism for storing and querying high-dimensional numeric embeddings. These embeddings let systems identify semantically similar items (documents, sentences, images) based on meaning instead of exact keyword matches.

Introduction to Hazelcast Vector Collections

2. What is a Vector Collection?

A Hazelcast Vector Collections is a distributed data structure built to store vector embeddings, compact numerical representations of items, and to enable similarity-based searches. Each item is represented by a vector of numbers, and you can query the collection to find items whose vectors are most similar to a given input using metrics such as cosine similarity or Euclidean distance.

In simpler terms:

  • A vector collection stores each item as a vector, a list of floating-point numbers,
  • Lets you search for items whose vectors are most similar to a given input.
  • Similarity is computed with metrics such as cosine similarity or Euclidean distance.

Example:
If you store embeddings of product descriptions, you can query the vector collection to find products similar in meaning, not just keyword match.

3. Why Use Vector Collections in Hazelcast?

Hazelcast Vector Collections bring AI capabilities to distributed systems, combining semantic search with low-latency in-memory performance.

Here are some reasons why developers prefer them:

  • Distributed and Scalable – Handles large datasets across multiple nodes.
  • Low Latency – In-memory computation ensures millisecond-level response times.
  • Integration with Embedding Models – Works with vector embeddings generated by models like OpenAI, BERT, or Sentence Transformers.
  • Seamless Hazelcast Integration – Reuse existing Hazelcast infrastructure and management tools.
  • High Availability – Fault-tolerant and replicated across the cluster.

4. Understanding Vectors and Embeddings

Before diving deeper, it’s important to understand what vector embeddings are.

A vector embedding is a numerical representation of data (text, image, or sound) in a multi-dimensional space.
For example:

  • The word “King” could be represented as [0.52, 0.33, -0.21, ...]
  • The word “Queen” would have a similar vector close to “King” in the space.

This allows algorithms to measure semantic similarity — meaning that King and Queen are closer to each other than King and Car.

When Hazelcast stores these vectors, it can efficiently compare and search for related vectors using similarity metrics.

5. Hazelcast VectorIndexConfig Explained

To create a Vector Collection, Hazelcast uses VectorIndexConfig, a configuration class that defines how the vector data is indexed and searched.

Here are the main parameters:

ParameterDescription
nameName of the index.
dimensionsThe number of dimensions in the vector (e.g., 128, 256, 768).
similarityMetricDefines how similarity is measured (COSINE, EUCLIDEAN, DOT).
maxNeighborsControls how many neighboring vectors to consider during search.
efConstructionBalances between index accuracy and build speed.
efSearchControls the number of candidate neighbors explored during queries (higher = more accurate, slower).
mNumber of edges (connections) per node in the index graph. Influences search performance.

These parameters are based on HNSW (Hierarchical Navigable Small World) algorithm, a state-of-the-art approach for efficient vector similarity search.

6. Supported Similarity Metrics (Cosine, Euclidean, Dot)

Hazelcast supports three major similarity metrics, allowing flexibility based on your use case.

a. Cosine Similarity

  • Measures the angle between two vectors.
  • Best for comparing text embeddings where magnitude doesn’t matter.
  • Values range from -1 (opposite) to +1 (identical).
  • Formula:
Hazelcast Vector Collections Cosine Similarity

b. Euclidean Distance

  • Measures the straight-line distance between two points.
  • Works well for physical or spatial data.
  • Smaller values mean higher similarity.
  • Formula:
Euclidean Distance

c. Dot Product

  • Computes the dot product of two vectors.
  • Often used when vectors are normalized or scaled embeddings.
  • The higher the dot product, the more similar the vectors.

7. Indexing and Querying Vectors

Once you define a VectorIndexConfig, Hazelcast automatically builds an HNSW index. This index allows fast approximate nearest neighbor (ANN) searches.

Key operations:

  • Insert: Add a vector to the collection.
  • Query: Search for the top-N most similar vectors to an input.
  • Delete: Remove a vector from the collection.

You can query using the Hazelcast client API by passing an input vector and getting the most similar matches with their distance scores.

8. Hazelcast Vector Search Architecture

Here’s how vector search fits into the Hazelcast ecosystem:

  1. Client Application – Sends a vector query (like a sentence embedding).
  2. Hazelcast Cluster – Holds distributed vector collections with indexes.
  3. HNSW Index Layer – Executes approximate nearest neighbor searches.
  4. Result Aggregation – Combines results from all cluster members.
  5. Response – Returns the top similar results to the client.

This architecture ensures horizontal scalability — as your data grows, you simply add more Hazelcast nodes.

9. Step-by-Step: Creating a Vector Collection in Hazelcast

Let’s walk through how to create a Vector Collection in Hazelcast.

Step 1: Configure Hazelcast

Config config = new Config();
MapConfig mapConfig = new MapConfig("productVectors");

VectorIndexConfig vectorIndexConfig = new VectorIndexConfig("vectorIndex")
        .setDimensions(128)
        .setSimilarityMetric(SimilarityMetric.COSINE);

mapConfig.addIndexConfig(vectorIndexConfig);
config.addMapConfig(mapConfig);

HazelcastInstance hazelcast = Hazelcast.newHazelcastInstance(config);

Step 2: Store Vector Data

IMap<String, float[]> productVectors = hazelcast.getMap("productVectors");
productVectors.put("p1", new float[]{0.12f, 0.45f, 0.33f, ...});
productVectors.put("p2", new float[]{0.14f, 0.47f, 0.32f, ...});

Step 3: Query Similar Vectors

float[] queryVector = new float[]{0.13f, 0.46f, 0.31f, ...};

VectorSearchPredicate predicate = new VectorSearchPredicate("vectorIndex", queryVector, 3);
Collection<Map.Entry<String, float[]>> results = productVectors.entrySet(predicate);

results.forEach(entry -> 
    System.out.println("Similar Product: " + entry.getKey())
);

That’s it! Hazelcast will return the top 3 most similar vectors to your query.

10. Complete Java Example: Hazelcast Vector Search Implementation

Here’s a complete Java example that demonstrates a working vector search in Hazelcast.

import com.hazelcast.config.*;
import com.hazelcast.core.*;
import com.hazelcast.map.IMap;
import com.hazelcast.query.predicates.VectorSearchPredicate;

import java.util.Collection;
import java.util.Map;

public class HazelcastVectorSearchExample {
    public static void main(String[] args) {
        Config config = new Config();

        MapConfig mapConfig = new MapConfig("productVectors");
        VectorIndexConfig vectorIndex = new VectorIndexConfig("vectorIndex")
                .setDimensions(4)
                .setSimilarityMetric(SimilarityMetric.COSINE);
        mapConfig.addIndexConfig(vectorIndex);
        config.addMapConfig(mapConfig);

        HazelcastInstance hazelcast = Hazelcast.newHazelcastInstance(config);

        IMap<String, float[]> productVectors = hazelcast.getMap("productVectors");
        productVectors.put("Phone", new float[]{0.11f, 0.43f, 0.33f, 0.21f});
        productVectors.put("Laptop", new float[]{0.12f, 0.45f, 0.34f, 0.22f});
        productVectors.put("Headphones", new float[]{0.54f, 0.12f, 0.98f, 0.43f});

        float[] queryVector = new float[]{0.10f, 0.44f, 0.32f, 0.20f};
        VectorSearchPredicate predicate = new VectorSearchPredicate("vectorIndex", queryVector, 2);

        Collection<Map.Entry<String, float[]>> results = productVectors.entrySet(predicate);
        for (Map.Entry<String, float[]> entry : results) {
            System.out.println("Similar item: " + entry.getKey());
        }

        hazelcast.shutdown();
    }
}

11. Real-World Use Cases of Hazelcast Vector Collections

Hazelcast Vector Collections can power various AI-driven applications, such as:

  • Semantic Search Engines – Search based on meaning, not keywords.
  • Recommendation Systems – Suggest similar products or content.
  • Document Clustering – Group similar articles or documents.
  • Image and Audio Retrieval – Match similar images or sounds.
  • Personalization Engines – Build tailored user experiences.

12. Advantages of Hazelcast Vector Search

  • In-Memory Speed – Sub-millisecond vector queries.
  • Horizontal Scalability – Add nodes for more data and throughput.
  • Compatibility with ML Models – Works with embeddings from OpenAI, Hugging Face, etc.
  • Ease of Integration – Native Hazelcast API support.
  • Cluster Fault Tolerance – High reliability for enterprise systems.

13. Limitations and Best Practices

Limitations

  • Not ideal for extremely large vectors (>2048 dimensions).
  • High memory consumption for dense embeddings.
  • Limited to approximate nearest neighbor (ANN) searches.

Best Practices

  • Normalize your vectors before storage.
  • Use Cosine similarity for text embeddings.
  • Optimize efSearch for a balance between speed and accuracy.
  • Regularly monitor cluster memory usage.

14. Future of Vector Search in Hazelcast

Hazelcast is rapidly evolving its AI and ML integration features. Future releases aim to:

  • Support hybrid queries combining SQL + Vector Search.
  • Enable multi-modal data (text + image embeddings).
  • Provide integration with AI pipelines and external vector databases.

This will position Hazelcast as a real-time AI data platform for large-scale applications.

15. Conclusion

Hazelcast Vector Collections represent a major step forward in combining distributed computing with AI-based vector similarity search. With this capability, developers can easily bring semantic search, recommendations, and AI intelligence directly into their Hazelcast clusters, without needing external vector databases.

Whether you’re building a search engine, a recommendation system, or a machine learning pipeline, Hazelcast’s vector collection provides speed, scalability, and simplicity, making it an ideal choice for modern AI-driven systems.

You can also explore the Hazelcast in details:

Leave a Comment

Your email address will not be published. Required fields are marked *

Index
Scroll to Top