Overview
In contemporary AI and ML environments, vectors are the primary way to encode and search complex, high-dimensional data such as text, images, audio, and user preferences. Starting with Hazelcast 5.4, Vector Collections let developers implement semantic search, recommendation engines, and other AI features directly inside a distributed in-memory data grid, removing the need for a separate vector database.
This article provides a thorough look at Hazelcast Vector Collections: what they do, how they operate, the available configuration options and similarity metrics, and a complete Java example to demonstrate how to add vector search to your applications.
Table of Contents
- Introduction to Hazelcast Vector Collections
- What is a Vector Collection?
- Why Use Vector Collections in Hazelcast?
- Understanding Vectors and Embeddings
- Hazelcast
VectorIndexConfigExplained - Supported Similarity Metrics (Cosine, Euclidean, Dot)
- Indexing and Querying Vectors
- Hazelcast Vector Search Architecture
- Step-by-Step: Creating a Vector Collection in Hazelcast
- Complete Java Example: Hazelcast Vector Search Implementation
- Real-World Use Cases of Hazelcast Vector Collections
- Advantages of Hazelcast Vector Search
- Limitations and Best Practices
- Future of Vector Search in Hazelcast
- Conclusion
1. Introduction to Hazelcast Vector Collections
Hazelcast, a distributed in-memory data grid (IMDG), provides fast caching, data distribution, and stream processing. As AI applications proliferate, data increasingly carries context and meaning rather than just simple key-value pairs.
To address this shift, Hazelcast added Vector Collections a native mechanism for storing and querying high-dimensional numeric embeddings. These embeddings let systems identify semantically similar items (documents, sentences, images) based on meaning instead of exact keyword matches.

2. What is a Vector Collection?
A Hazelcast Vector Collections is a distributed data structure built to store vector embeddings, compact numerical representations of items, and to enable similarity-based searches. Each item is represented by a vector of numbers, and you can query the collection to find items whose vectors are most similar to a given input using metrics such as cosine similarity or Euclidean distance.
In simpler terms:
- A vector collection stores each item as a vector, a list of floating-point numbers,
- Lets you search for items whose vectors are most similar to a given input.
- Similarity is computed with metrics such as cosine similarity or Euclidean distance.
Example:
If you store embeddings of product descriptions, you can query the vector collection to find products similar in meaning, not just keyword match.
3. Why Use Vector Collections in Hazelcast?
Hazelcast Vector Collections bring AI capabilities to distributed systems, combining semantic search with low-latency in-memory performance.
Here are some reasons why developers prefer them:
- Distributed and Scalable – Handles large datasets across multiple nodes.
- Low Latency – In-memory computation ensures millisecond-level response times.
- Integration with Embedding Models – Works with vector embeddings generated by models like OpenAI, BERT, or Sentence Transformers.
- Seamless Hazelcast Integration – Reuse existing Hazelcast infrastructure and management tools.
- High Availability – Fault-tolerant and replicated across the cluster.
4. Understanding Vectors and Embeddings
Before diving deeper, it’s important to understand what vector embeddings are.
A vector embedding is a numerical representation of data (text, image, or sound) in a multi-dimensional space.
For example:
- The word “King” could be represented as
[0.52, 0.33, -0.21, ...] - The word “Queen” would have a similar vector close to “King” in the space.
This allows algorithms to measure semantic similarity — meaning that King and Queen are closer to each other than King and Car.
When Hazelcast stores these vectors, it can efficiently compare and search for related vectors using similarity metrics.
5. Hazelcast VectorIndexConfig Explained
To create a Vector Collection, Hazelcast uses VectorIndexConfig, a configuration class that defines how the vector data is indexed and searched.
Here are the main parameters:
| Parameter | Description |
|---|---|
| name | Name of the index. |
| dimensions | The number of dimensions in the vector (e.g., 128, 256, 768). |
| similarityMetric | Defines how similarity is measured (COSINE, EUCLIDEAN, DOT). |
| maxNeighbors | Controls how many neighboring vectors to consider during search. |
| efConstruction | Balances between index accuracy and build speed. |
| efSearch | Controls the number of candidate neighbors explored during queries (higher = more accurate, slower). |
| m | Number of edges (connections) per node in the index graph. Influences search performance. |
These parameters are based on HNSW (Hierarchical Navigable Small World) algorithm, a state-of-the-art approach for efficient vector similarity search.
6. Supported Similarity Metrics (Cosine, Euclidean, Dot)
Hazelcast supports three major similarity metrics, allowing flexibility based on your use case.
a. Cosine Similarity
- Measures the angle between two vectors.
- Best for comparing text embeddings where magnitude doesn’t matter.
- Values range from -1 (opposite) to +1 (identical).
- Formula:

b. Euclidean Distance
- Measures the straight-line distance between two points.
- Works well for physical or spatial data.
- Smaller values mean higher similarity.
- Formula:

c. Dot Product
- Computes the dot product of two vectors.
- Often used when vectors are normalized or scaled embeddings.
- The higher the dot product, the more similar the vectors.
7. Indexing and Querying Vectors
Once you define a VectorIndexConfig, Hazelcast automatically builds an HNSW index. This index allows fast approximate nearest neighbor (ANN) searches.
Key operations:
- Insert: Add a vector to the collection.
- Query: Search for the top-N most similar vectors to an input.
- Delete: Remove a vector from the collection.
You can query using the Hazelcast client API by passing an input vector and getting the most similar matches with their distance scores.
8. Hazelcast Vector Search Architecture
Here’s how vector search fits into the Hazelcast ecosystem:
- Client Application – Sends a vector query (like a sentence embedding).
- Hazelcast Cluster – Holds distributed vector collections with indexes.
- HNSW Index Layer – Executes approximate nearest neighbor searches.
- Result Aggregation – Combines results from all cluster members.
- Response – Returns the top similar results to the client.
This architecture ensures horizontal scalability — as your data grows, you simply add more Hazelcast nodes.
9. Step-by-Step: Creating a Vector Collection in Hazelcast
Let’s walk through how to create a Vector Collection in Hazelcast.
Step 1: Configure Hazelcast
Config config = new Config();
MapConfig mapConfig = new MapConfig("productVectors");
VectorIndexConfig vectorIndexConfig = new VectorIndexConfig("vectorIndex")
.setDimensions(128)
.setSimilarityMetric(SimilarityMetric.COSINE);
mapConfig.addIndexConfig(vectorIndexConfig);
config.addMapConfig(mapConfig);
HazelcastInstance hazelcast = Hazelcast.newHazelcastInstance(config);
Step 2: Store Vector Data
IMap<String, float[]> productVectors = hazelcast.getMap("productVectors");
productVectors.put("p1", new float[]{0.12f, 0.45f, 0.33f, ...});
productVectors.put("p2", new float[]{0.14f, 0.47f, 0.32f, ...});
Step 3: Query Similar Vectors
float[] queryVector = new float[]{0.13f, 0.46f, 0.31f, ...};
VectorSearchPredicate predicate = new VectorSearchPredicate("vectorIndex", queryVector, 3);
Collection<Map.Entry<String, float[]>> results = productVectors.entrySet(predicate);
results.forEach(entry ->
System.out.println("Similar Product: " + entry.getKey())
);
That’s it! Hazelcast will return the top 3 most similar vectors to your query.
10. Complete Java Example: Hazelcast Vector Search Implementation
Here’s a complete Java example that demonstrates a working vector search in Hazelcast.
import com.hazelcast.config.*;
import com.hazelcast.core.*;
import com.hazelcast.map.IMap;
import com.hazelcast.query.predicates.VectorSearchPredicate;
import java.util.Collection;
import java.util.Map;
public class HazelcastVectorSearchExample {
public static void main(String[] args) {
Config config = new Config();
MapConfig mapConfig = new MapConfig("productVectors");
VectorIndexConfig vectorIndex = new VectorIndexConfig("vectorIndex")
.setDimensions(4)
.setSimilarityMetric(SimilarityMetric.COSINE);
mapConfig.addIndexConfig(vectorIndex);
config.addMapConfig(mapConfig);
HazelcastInstance hazelcast = Hazelcast.newHazelcastInstance(config);
IMap<String, float[]> productVectors = hazelcast.getMap("productVectors");
productVectors.put("Phone", new float[]{0.11f, 0.43f, 0.33f, 0.21f});
productVectors.put("Laptop", new float[]{0.12f, 0.45f, 0.34f, 0.22f});
productVectors.put("Headphones", new float[]{0.54f, 0.12f, 0.98f, 0.43f});
float[] queryVector = new float[]{0.10f, 0.44f, 0.32f, 0.20f};
VectorSearchPredicate predicate = new VectorSearchPredicate("vectorIndex", queryVector, 2);
Collection<Map.Entry<String, float[]>> results = productVectors.entrySet(predicate);
for (Map.Entry<String, float[]> entry : results) {
System.out.println("Similar item: " + entry.getKey());
}
hazelcast.shutdown();
}
}
11. Real-World Use Cases of Hazelcast Vector Collections
Hazelcast Vector Collections can power various AI-driven applications, such as:
- Semantic Search Engines – Search based on meaning, not keywords.
- Recommendation Systems – Suggest similar products or content.
- Document Clustering – Group similar articles or documents.
- Image and Audio Retrieval – Match similar images or sounds.
- Personalization Engines – Build tailored user experiences.
12. Advantages of Hazelcast Vector Search
- In-Memory Speed – Sub-millisecond vector queries.
- Horizontal Scalability – Add nodes for more data and throughput.
- Compatibility with ML Models – Works with embeddings from OpenAI, Hugging Face, etc.
- Ease of Integration – Native Hazelcast API support.
- Cluster Fault Tolerance – High reliability for enterprise systems.
13. Limitations and Best Practices
Limitations
- Not ideal for extremely large vectors (>2048 dimensions).
- High memory consumption for dense embeddings.
- Limited to approximate nearest neighbor (ANN) searches.
Best Practices
- Normalize your vectors before storage.
- Use Cosine similarity for text embeddings.
- Optimize efSearch for a balance between speed and accuracy.
- Regularly monitor cluster memory usage.
14. Future of Vector Search in Hazelcast
Hazelcast is rapidly evolving its AI and ML integration features. Future releases aim to:
- Support hybrid queries combining SQL + Vector Search.
- Enable multi-modal data (text + image embeddings).
- Provide integration with AI pipelines and external vector databases.
This will position Hazelcast as a real-time AI data platform for large-scale applications.
15. Conclusion
Hazelcast Vector Collections represent a major step forward in combining distributed computing with AI-based vector similarity search. With this capability, developers can easily bring semantic search, recommendations, and AI intelligence directly into their Hazelcast clusters, without needing external vector databases.
Whether you’re building a search engine, a recommendation system, or a machine learning pipeline, Hazelcast’s vector collection provides speed, scalability, and simplicity, making it an ideal choice for modern AI-driven systems.
You can also explore the Hazelcast in details:








