KuKu's picture

KuKu

dragonkue

·

AI & ML interests

anything.

Recent Activity

upvoted a paper about 5 hours ago

Judging What We Cannot Solve: A Consequence-Based Approach for Oracle-Free Evaluation of Research-Level Math

liked a dataset 6 days ago

lance-format/fineweb-edu

upvoted a collection 11 days ago

Embedding Model Datasets

View all activity

Organizations

upvoted a paper about 5 hours ago

Judging What We Cannot Solve: A Consequence-Based Approach for Oracle-Free Evaluation of Research-Level Math

Paper • 2602.06291 • Published 12 days ago • 23

upvoted 2 collections 11 days ago

Embedding Model Datasets

A curated subset of the datasets that work out of the box with Sentence Transformers: https://huggingface.co/datasets?other=sentence-transformers • 70 items • Updated Dec 10, 2025 • 163

NanoBEIR datasets

These datasets are compatible with the (Sparse)NanoBEIREvaluator with Sentence Transformers v5.2+. Also CrossEncoderNanoBEIREvaluator if bm25 column • 18 items • Updated 18 days ago • 14

upvoted a collection 12 days ago

Codefuse Embeddings

9 items • Updated 6 days ago • 8

upvoted a paper 19 days ago

Llama Nemoretriever Colembed: Top-Performing Text-Image Retrieval Model

Paper • 2507.05513 • Published Jul 7, 2025 • 1

upvoted an article 22 days ago

Article

RexRerankers: SOTA Rankers for Product Discovery and AI Assistants

24 days ago

•

44

upvoted an article about 1 month ago

Article

How We Built a Semantic Highlight Model To Save Token Cost for RAG

Jan 15

•

65

upvoted a collection about 1 month ago

KoViDoRe Benchmark (BEIR) v2

Korean Vision Document Retrieval Benchmark • 6 items • Updated Jan 15 • 5

upvoted an article about 2 months ago

Article

Nano-BEIR: A Multilingual Information Retrieval Benchmark with Quality-Enhanced Queries

Dec 22, 2025

•

9

upvoted a paper 2 months ago

KaLM-Embedding: Superior Training Data Brings A Stronger Embedding Model

Paper • 2501.01028 • Published Jan 2, 2025 • 19

upvoted an article 2 months ago

Article

We Got Claude to Fine-Tune an Open Source LLM

Dec 4, 2025

•

593

upvoted a collection 3 months ago

Tarka Embed V1

Efficient DFKD embeddings for language understanding • 5 items • Updated Dec 17, 2025 • 6

upvoted 4 papers 3 months ago

Black-Box On-Policy Distillation of Large Language Models

Paper • 2511.10643 • Published Nov 13, 2025 • 52

Preserving Multilingual Quality While Tuning Query Encoder on English Only

Paper • 2407.00923 • Published Jul 1, 2024 • 1

Llama-Embed-Nemotron-8B: A Universal Text Embedding Model for Multilingual and Cross-Lingual Tasks

Paper • 2511.07025 • Published Nov 10, 2025 • 14

KORMo: Korean Open Reasoning Model for Everyone

Paper • 2510.09426 • Published Oct 10, 2025 • 86

upvoted a collection 4 months ago

Nemotron RAG

Set of tools to build retrieval-augmented generation (RAG) systems, improve search and ranking accuracy, and extract structured data from complex docs • 9 items • Updated 13 days ago • 72

upvoted 2 articles 4 months ago

Article

Vocabulary is the most important element of Sparse Retrieval

Oct 4, 2025

•

10

Article

The Past and Present of Sparse Retrieval

Oct 4, 2025

•

6

upvoted a paper 4 months ago

Pushing on Multilingual Reasoning Models with Language-Mixed Chain-of-Thought

Paper • 2510.04230 • Published Oct 5, 2025 • 27