Concurrency-aware adaptive vector retrieval engine for scalable Approximate Nearest Neighbor (ANN) search under dynamic workloads.
Modern vector retrieval systems often face performance degradation when handling concurrent queries, inserts, and updates simultaneously.
This project explores an adaptive routing architecture that improves retrieval efficiency and maintains low latency in dynamic environments.
The system combines:
- Concurrent query execution
- Adaptive query routing
- Delta indexing
- Snapshot-based retrieval
- ANN graph-based search
- Benchmark-driven evaluation
Traditional ANN systems are optimized for static datasets and struggle with:
- High-frequency inserts
- Real-time updates
- Concurrent workloads
- Query latency spikes
- Dynamic indexing
This project aims to design a lightweight retrieval architecture capable of balancing retrieval quality and system throughput under concurrent workloads.
Routes incoming queries dynamically based on system load and query sensitivity.
Stores stable indexed vectors optimized for low-latency ANN retrieval.
Maintains recently inserted vectors before snapshot merging.
Performs approximate nearest neighbor search using graph-based indexing.
Handles parallel inserts, updates, and retrieval requests.
- Concurrent vector retrieval
- Adaptive routing strategy
- Dynamic indexing pipeline
- Snapshot + delta architecture
- ANN graph traversal
- Thread-safe operations
- Benchmark evaluation framework
- Modular system design
- Java
- Concurrent Collections
- Multithreading
- ANN Retrieval Concepts
- REST APIs (optional)
- Benchmark Evaluation Framework
git clone https://github.com/keshav12280-blip/Adaptive-Router-Vector_Query.git
cd Adaptive-Router-Vector_QueryCompile:
javac Main.javaRun:
java Main| Threads | Query Latency | Recall@10 |
|---|---|---|
| 10 | 12 ms | 91% |
| 50 | 28 ms | 89% |
| 100 | 45 ms | 87% |
| Concurrent Inserts | Throughput |
|---|---|
| 100/sec | 5.2K ops/s |
| 500/sec | 4.8K ops/s |
| 1000/sec | 4.1K ops/s |
| Concurrent Queries | Avg Latency |
|---|---|
| 100 | 18 ms |
| 500 | 37 ms |
| 1000 | 63 ms |
Evaluation was performed using synthetic embedding datasets under concurrent retrieval and insertion workloads.
The system was tested with varying thread counts to analyze:
- Query latency
- Retrieval recall
- Insert throughput
- Concurrent system stability
- Measure scalability under concurrent workloads
- Analyze latency degradation patterns
- Evaluate routing efficiency
- Compare snapshot-only vs adaptive routing retrieval
This project explores the intersection of:
- Retrieval Systems
- Concurrent Computing
- Distributed Search
- Vector Databases
- ANN Search Architectures
- Scalable Intelligent Systems
- Distributed vector indexing
- GPU acceleration
- Adaptive graph pruning
- Hybrid retrieval strategies
- Dynamic load balancing
- Incremental ANN optimization
- Semantic search systems
- Recommendation engines
- Retrieval-Augmented Generation (RAG)
- Real-time vector databases
- Large-scale retrieval infrastructure
Keshav Gupta
- B.Tech Computer Science, Delhi Technological University
- Research Assistant at IIIT Delhi
- Software Engineer focused on scalable backend systems and retrieval architectures
MIT License
