Search-Embedding-lab

📦 Overview

An end-to-end dense retrieval system for semantic search, focused on training embedding models from scratch, evaluating retrieval quality, and iterating on architecture design. The goal is to mirror how modern AI-native search systems are built: crawl → embed → index → retrieve → evaluate.

🧠 Approach

Query ──► Query Encoder ─┐ ├─► Similarity (dot / cosine) ─► Ranking Docs ──► Doc Encoder ──┘

Training: (Query, Positive Doc, Hard Negatives) → InfoNCE Contrastive Loss

Inference: Query → FAISS ANN → Top-K Documents

📈 Findings

=== Retrieval metrics === K= 1 Recall@K=0.33 MRR=0.55 nDCG@K=0.66 K= 5 Recall@K=1.00 MRR=0.55 nDCG@K=0.66 K= 10 Recall@K=1.00 MRR=0.55 nDCG@K=0.66 K=100 Recall@K=1.00 MRR=0.55 nDCG@K=0.66 n_queries_eval: 6