Judging What We Cannot Solve: A Consequence-Based Approach for Oracle-Free Evaluation of Research-Level Math Paper • 2602.06291 • Published 12 days ago • 23
Embedding Model Datasets Collection A curated subset of the datasets that work out of the box with Sentence Transformers: https://huggingface.co/datasets?other=sentence-transformers • 70 items • Updated Dec 10, 2025 • 163
NanoBEIR datasets Collection These datasets are compatible with the (Sparse)NanoBEIREvaluator with Sentence Transformers v5.2+. Also CrossEncoderNanoBEIREvaluator if bm25 column • 18 items • Updated 18 days ago • 14
Llama Nemoretriever Colembed: Top-Performing Text-Image Retrieval Model Paper • 2507.05513 • Published Jul 7, 2025 • 1
view article Article RexRerankers: SOTA Rankers for Product Discovery and AI Assistants 24 days ago • 44
KoViDoRe Benchmark (BEIR) v2 Collection Korean Vision Document Retrieval Benchmark • 6 items • Updated Jan 15 • 5
view article Article Nano-BEIR: A Multilingual Information Retrieval Benchmark with Quality-Enhanced Queries Dec 22, 2025 • 9
KaLM-Embedding: Superior Training Data Brings A Stronger Embedding Model Paper • 2501.01028 • Published Jan 2, 2025 • 19
Tarka Embed V1 Collection Efficient DFKD embeddings for language understanding • 5 items • Updated Dec 17, 2025 • 6
Black-Box On-Policy Distillation of Large Language Models Paper • 2511.10643 • Published Nov 13, 2025 • 52
Preserving Multilingual Quality While Tuning Query Encoder on English Only Paper • 2407.00923 • Published Jul 1, 2024 • 1
Llama-Embed-Nemotron-8B: A Universal Text Embedding Model for Multilingual and Cross-Lingual Tasks Paper • 2511.07025 • Published Nov 10, 2025 • 14
Nemotron RAG Collection Set of tools to build retrieval-augmented generation (RAG) systems, improve search and ranking accuracy, and extract structured data from complex docs • 9 items • Updated 13 days ago • 72
Pushing on Multilingual Reasoning Models with Language-Mixed Chain-of-Thought Paper • 2510.04230 • Published Oct 5, 2025 • 27